gopar
gopar copied to clipboard
UTF8 rather than ASCII
How do you feel about supporting UTF8 filenames? par2/string.go
seems to support decoding something more current than ASCII, but does not support encoding it. I don't know if there is anything in the par2
'spec' that has anything to say on the mater, but it goes without saying that this would be nice. par2cmdline
does support more than ASCII at this point, so I'm having some issues with gopar
where par2cmdline
succeeds.
Looking at the par spec, it looks like 'unicode' filenames are supported via a separate packet type: http://parchive.sourceforge.net/docs/specifications/parity-volume-spec/article-spec.html#i__134603784_1221
I'm definitely open to supporting utf8. can you give an example case where gopar
has issues, and I can dig into it?
Creating a new parity file for any filenames containing non-ascii chars results in the error defined in par/string.go
. If you want a few testcases, you could use something like https://onlineutf8tools.com/generate-random-utf8 or pick a few from https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
Maybe it's good to mention that any non-ASCII files are marked irrepairable if they had been created with par2cmdline
.
Wondering if I can bring up this issue again :)
A bit of spelunking reveal the UrlEncodeChar
function, which I think might be responsible for how par2cmdline
does it:
https://github.com/Parchive/par2cmdline/blob/ea3ba79f359cbe881fd807c5dc8e13b777dbf4c2/src/descriptionpacket.cpp#L119 https://github.com/Parchive/par2cmdline/blob/ea3ba79f359cbe881fd807c5dc8e13b777dbf4c2/src/descriptionpacket_test.cpp#L33
Do you think you have some time to look into it?
I'll try to look at it this weekend! I don't know how much time I have for gopar coding these days, though... :(