gopar icon indicating copy to clipboard operation
gopar copied to clipboard

UTF8 rather than ASCII

Open brenthuisman opened this issue 4 years ago • 4 comments

How do you feel about supporting UTF8 filenames? par2/string.go seems to support decoding something more current than ASCII, but does not support encoding it. I don't know if there is anything in the par2 'spec' that has anything to say on the mater, but it goes without saying that this would be nice. par2cmdline does support more than ASCII at this point, so I'm having some issues with gopar where par2cmdline succeeds.

brenthuisman avatar Feb 22 '21 20:02 brenthuisman

Looking at the par spec, it looks like 'unicode' filenames are supported via a separate packet type: http://parchive.sourceforge.net/docs/specifications/parity-volume-spec/article-spec.html#i__134603784_1221

I'm definitely open to supporting utf8. can you give an example case where gopar has issues, and I can dig into it?

akalin avatar Feb 22 '21 20:02 akalin

Creating a new parity file for any filenames containing non-ascii chars results in the error defined in par/string.go. If you want a few testcases, you could use something like https://onlineutf8tools.com/generate-random-utf8 or pick a few from https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

Maybe it's good to mention that any non-ASCII files are marked irrepairable if they had been created with par2cmdline.

brenthuisman avatar Feb 22 '21 22:02 brenthuisman

Wondering if I can bring up this issue again :)

A bit of spelunking reveal the UrlEncodeChar function, which I think might be responsible for how par2cmdline does it:

https://github.com/Parchive/par2cmdline/blob/ea3ba79f359cbe881fd807c5dc8e13b777dbf4c2/src/descriptionpacket.cpp#L119 https://github.com/Parchive/par2cmdline/blob/ea3ba79f359cbe881fd807c5dc8e13b777dbf4c2/src/descriptionpacket_test.cpp#L33

Do you think you have some time to look into it?

brenthuisman avatar Jan 29 '22 19:01 brenthuisman

I'll try to look at it this weekend! I don't know how much time I have for gopar coding these days, though... :(

akalin avatar Feb 05 '22 21:02 akalin