If the file is valid UTF-8 except with some odd lines, use a similar approach to print lines that fail UTF-8 decoding: perl -MEncode -ne 'print "Line $.:\t$_" if !eval' < logfile. all of it is likely to be non-UTF-8), so an easy way to find the problematic characters is to search for non-ASCII bytes ( LC_ALL=C may be needed): grep -a -P -n -color '' logfile.log Usually the entire file is in the same encoding (i.e. (Also, what 'file' says about the input being "ASCII" is based entirely on a quick look at the initial bytes within the file it doesn't actually scan the entire thing, whereas 'grep' of course does.) Running grep under the "C" locale with LC_CTYPE=C grep or LC_ALL=C grep may also avoid this problem. The "binary file" detection is codepage-sensitive – if grep expects UTF-8 input as usual on Linux, it will actually end up detecting "ANSI" (Windows-125x, ISO 8859-x) encoded text files as binary files. Use grep -a to force a file to always be treated as text.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |