support files with non-ascii chars im name are not copied to the build dir
I have (on windows 10) a file grüße.txt in a support folder. This file is not copied to the build dir. Another file with ascii in the same dir works fine. During a test l3build reports
Datei grüße.txt nicht gefunden
This is 'not out fault': see https://stackoverflow.com/questions/54118507/lua-how-to-make-os-rename-os-remove-work-with-filenames-containing-unicode-ch/54124164. Taking the function from https://stackoverflow.com/a/54124164/212001 to get the current codepoint, I get 1252 from a Lua call even if I've used chcp 65001 at the Command Prompt to set the interactive one to UTF-8.
LuaTeX/texlua doesn't have winapi, etc., and I really don't fancy setting up conversions for all possible codepages, so I think we'll have to close this 'wontfix' or with a doc-only change.
We could I guess check each name is 'safe' by comparing the UTF-8 length with the byte length, but that feels like overkill to me.
what would happen if you don't use the internal lua functions but invoke a shell and do file move, rename etc on that level?
@FrankMittelbach Well we already partly-do: it's all os.execute(...) not e.g. os.remove(...) as the latter gives uncontrolled messages, etc. I guess you mean could we do something
os.execute("chcp 65001 && echo grüße.txt")
but that doesn't seem to work either :(
no I thought that inside the build.lua file "grüße.txt" is a UTF-8 string so in os.execute you could just do mv grüße.txt somewhereelse. However, if the cmd shell on windows is not using utf-8 but some 8-bit code page then you are dead in the water ... is that the issue?
@FrankMittelbach Other than Lua is strictly byte-based (so think pdfTeX), that's the situation we have at present: of course, for Windows we are using xcopy :)
then forget my remark :-)
It is quite a pain with luatex and non-ascii files names. I wonder if one can make use of whatever function is used in windows by the \input commands et al? (But the good news is that I can work around it by using filecontents, as it writes files with unix line endings also on windows and so gives reproducible results ...)
@u-fischer We don't do anything different with \input: my guess is that the TeX call is taking the raw bytes from the passed call, but the shell functions are not - everything is just os.execute() under the hood.
@josephwright yes, but I remember that it was quite a struggle to get \input to handle non-ascii chars. I don't know what Akira did at the end to get it working, but perhaps it is something that should also be used when calling the shell.
@u-fischer That's an engine change - so still not our fault
@davidcarlisle dug out that on Windows we have chgstrcp.utf8tosyscp() available in LuaTeX and thus texlua. At least for file names that fall within the current system codepage, that will help. I'll see if I can put something together - the main challenge will be picking out all of the file name usage!
@u-fischer Could you check to see if the change I've just pushed works? It's only going to support filenames that fit in the current codepage but that's likely to be the common case.