GcLinkParser icon indicating copy to clipboard operation
GcLinkParser copied to clipboard

Foreign characters support

Open ghost opened this issue 9 years ago • 10 comments

Hi, When parsing some chinese .lnk files, i found that GcLinkParser was not able to parse them. Below is a sample error output I received. Thanks. init:info:1627 Parsing File: D:\LnkFiles\碧雲天.LNK init:info:1627 Linkfile: None init:error:1601 IOError on file D:\LnkFiles\碧雲天.LNK

ghost avatar Aug 02 '16 02:08 ghost

I added a -d (--directory) option that should allow it to work now.

The issue appears to be that Windows cannot handle the encoding from the command prompt. When you dir the folder it cannot pass the proper names to the tool.

Thus, instead of: GcLinkParser.py -f FILENAME_WITH_SYMBOLS --json or dir /b /s DIR\*.lnk | GcLinkParser.py --pipe --json

You would want: GcLinkParser.py -d DIRECTORY_WITH_LNKS --json

commit

devgc avatar Oct 11 '16 15:10 devgc

Sorry I am still having issues with foreign language support. I am trying to parse some Russian LNK Files. Please see attached screenshot for error. screenshot

ghost avatar Oct 14 '16 06:10 ghost

Go ahead and give it another try now. Let me know if it fixes it.

commit

devgc avatar Oct 14 '16 13:10 devgc

Sorry still not working. I had another python program which parses Chinese characters sometime ago but that did not work against the Russian characters this time which i found it odd as well. I am yet to troubleshoot it too.
image

ghost avatar Oct 15 '16 04:10 ghost

is it possible to share the lnk file causing this issue?

EricZimmerman avatar Oct 15 '16 13:10 EricZimmerman

Hi Eric! :) Yes, i have sanitised it. Please find attached sample file. Изображение 003.jpg.lnk.zip

ghost avatar Oct 17 '16 13:10 ghost

It seems that this is largely a Windows encoding issue due to wide characters.

Windows Explore does not even display the proper character set for me. image

However, I can get the GcLinkParser to parser the file by using the -d option and processing the directory from within the tool instead of using -f or --pipe for input (which relies on the Windows character set to pass the filename).

GcLinkParser.py --txt -d ..\testfiles\errors\003.jpg.lnk > out.txt OR GcLinkParser.py --json -d ..\testfiles\errors\003.jpg.lnk > out.json

The second thing to note is the output. While TXT output can display the Unicode. JSON is replaying the Unicode with the ASCII version. (like \u0437). image

I will see if I can add an option to write JSON as Unicode too. But this could possibly be problematic in what you use to view the JSON or if you are doing any type of processing on it.

By using the -d option, are you able to parse the link files?

devgc avatar Oct 17 '16 15:10 devgc

I added a --json_ascii option because I set the default for JSON dumps to print Unicode instead of the ASCII Unicode characters.

image

commit

devgc avatar Oct 17 '16 15:10 devgc

Sorry for my late response. I tried the -d option but it did not work for me.

screenshot

ghost avatar Nov 24 '16 08:11 ghost

Try without the --json_ascii option. What are the results?

devgc avatar Nov 28 '16 15:11 devgc