l3build icon indicating copy to clipboard operation
l3build copied to clipboard

support files with non-ascii chars im name are not copied to the build dir

Open u-fischer opened this issue 5 years ago • 13 comments

I have (on windows 10) a file grüße.txt in a support folder. This file is not copied to the build dir. Another file with ascii in the same dir works fine. During a test l3build reports

Datei grüße.txt nicht gefunden

u-fischer avatar Jul 03 '20 18:07 u-fischer

This is 'not out fault': see https://stackoverflow.com/questions/54118507/lua-how-to-make-os-rename-os-remove-work-with-filenames-containing-unicode-ch/54124164. Taking the function from https://stackoverflow.com/a/54124164/212001 to get the current codepoint, I get 1252 from a Lua call even if I've used chcp 65001 at the Command Prompt to set the interactive one to UTF-8.

LuaTeX/texlua doesn't have winapi, etc., and I really don't fancy setting up conversions for all possible codepages, so I think we'll have to close this 'wontfix' or with a doc-only change.

josephwright avatar Oct 11 '23 05:10 josephwright

We could I guess check each name is 'safe' by comparing the UTF-8 length with the byte length, but that feels like overkill to me.

josephwright avatar Oct 11 '23 06:10 josephwright

what would happen if you don't use the internal lua functions but invoke a shell and do file move, rename etc on that level?

FrankMittelbach avatar Oct 11 '23 07:10 FrankMittelbach

@FrankMittelbach Well we already partly-do: it's all os.execute(...) not e.g. os.remove(...) as the latter gives uncontrolled messages, etc. I guess you mean could we do something

os.execute("chcp 65001 && echo grüße.txt")

but that doesn't seem to work either :(

josephwright avatar Oct 11 '23 07:10 josephwright

no I thought that inside the build.lua file "grüße.txt" is a UTF-8 string so in os.execute you could just do mv grüße.txt somewhereelse. However, if the cmd shell on windows is not using utf-8 but some 8-bit code page then you are dead in the water ... is that the issue?

FrankMittelbach avatar Oct 11 '23 08:10 FrankMittelbach

@FrankMittelbach Other than Lua is strictly byte-based (so think pdfTeX), that's the situation we have at present: of course, for Windows we are using xcopy :)

josephwright avatar Oct 11 '23 08:10 josephwright

then forget my remark :-)

FrankMittelbach avatar Oct 11 '23 08:10 FrankMittelbach

It is quite a pain with luatex and non-ascii files names. I wonder if one can make use of whatever function is used in windows by the \input commands et al? (But the good news is that I can work around it by using filecontents, as it writes files with unix line endings also on windows and so gives reproducible results ...)

u-fischer avatar Oct 11 '23 09:10 u-fischer

@u-fischer We don't do anything different with \input: my guess is that the TeX call is taking the raw bytes from the passed call, but the shell functions are not - everything is just os.execute() under the hood.

josephwright avatar Oct 11 '23 10:10 josephwright

@josephwright yes, but I remember that it was quite a struggle to get \input to handle non-ascii chars. I don't know what Akira did at the end to get it working, but perhaps it is something that should also be used when calling the shell.

u-fischer avatar Oct 11 '23 10:10 u-fischer

@u-fischer That's an engine change - so still not our fault

josephwright avatar Oct 11 '23 10:10 josephwright

@davidcarlisle dug out that on Windows we have chgstrcp.utf8tosyscp() available in LuaTeX and thus texlua. At least for file names that fall within the current system codepage, that will help. I'll see if I can put something together - the main challenge will be picking out all of the file name usage!

josephwright avatar Oct 11 '23 19:10 josephwright

@u-fischer Could you check to see if the change I've just pushed works? It's only going to support filenames that fit in the current codepage but that's likely to be the common case.

josephwright avatar Oct 12 '23 05:10 josephwright