opam-repository icon indicating copy to clipboard operation
opam-repository copied to clipboard

Fail compiling a switch when LC_ALL/LC_COLLATE is set to sv_SE.UTF-8 instead of C

Open anders-hig-jackson opened this issue 4 years ago • 8 comments

When I tried to install a switch with opam(1), the linking stage failed by missing names. On all systems that support locale, in any modern version of Ocaml I have tested (most 4.* versions up to 4.13). The problem is that OCaml building stage are using some regular expression, and Swedish locale doesn't treat 'w' as most other locales does.

To reproduce this, take any recent Linux distribution, like Ubuntu or Fedora, and install the Swedish locale, and then set the users locale to Swedish, which is just to set some environment variables.

export LANG=sv_SE.UTF-8
export LC_ALL=sv_SE.UTF-8

then check the output of the locale(1) command, and then create a new switch as usual (in any opam tutorial). Look at the failed linking stage for the compiler.

Solution is to inform the building system to use the proper locale for this, which is the C locale, for classification and sorting of characters, and by regular expressions.

export LC_ALL=""  # so it doesn't override all other LC_*
export LC_COLLATE=C  # set classification and ordering of chars to C locale

again check with locale(1) that interpretation is proper, and now creating a switch in opam(1) works, as it is supposed to.

I have sent a bug report to Ocaml about this, but I guess that this should really be set for all compilation, as that is probably what programmers that write building scripts expect that regular expressions should work there. Yes, the Swedish locale works as it should, so do C. They should be used in the proper environments though, and for programming it is locale C in this case. The C locale was created for this case.

anders-hig-jackson avatar Apr 12 '21 00:04 anders-hig-jackson

Reported in https://github.com/ocaml/ocaml/issues/10332 and pending fix in https://github.com/ocaml/ocaml/issues/10333.

I don't think we should be changing the default environment for switches, although perhaps running CI with Swedish locale set might be a good idea!

dra27 avatar Apr 12 '21 07:04 dra27

Should we backport https://github.com/ocaml/ocaml/pull/10333/files#diff-3c479c6bd9c60e1008b84eb57d13331b8b11a23e4f862d6c5271538d567c34bc to the old compilers as an opam-repo extra patch?

mseri avatar Apr 12 '21 08:04 mseri

I was going to suggest once it's merged, yes

dra27 avatar Apr 12 '21 09:04 dra27

Only setting LC_COLLATE=C and LC_ALL="" will change classification of characters, but not error messages etc. But yes, this will kill the bug. Thanks for the fast fix.

anders-hig-jackson avatar Apr 12 '21 15:04 anders-hig-jackson

Should simply adding:

build-env: [LC_ALL = "C"]

to the compilers work? Or would that mess something else?

kit-ty-kate avatar Apr 17 '21 11:04 kit-ty-kate

I think it would be better (because it's more explicit) to put put a patch in for https://github.com/ocaml/ocaml/pull/10333/commits/5f90caf6e2623d55883ddd385b1dc5198dff94a3 to the old compilers, but LC_ALL=C in build-env for those packages would work, yes

dra27 avatar Apr 18 '21 08:04 dra27

Agree. This patch will make compiling ocaml from source to work. If/when this patch is backported to other versions of OCaml compiler sources, it will fix the same problem there.

When that is done, users will be boot strapped to all users that uses opam to compile OCaml, I suppose. So they will now have a working system, if they download latest source when recompile the compiler from source. As I understand this, which could be wrong, then there are no need to apply any other patch. Not to opam, nor to Source code. As long as they compile from latest source code of the OCaml compiler.

anders-hig-jackson avatar Apr 19 '21 08:04 anders-hig-jackson

I get messages that this is stale, is it?

I have tested LC_COLLATE=C and that is the minimum that need to be changed so compilation works (and W not considered a proper character in Swedish doesn’t make compilation fail). The LC_ALL=C works too, but then messages are not translated to selected language from those parts where LC_ALL=C are set.

But yes, it should be added, if not already, so it will work properly and as it is supposed to work.

Yours Anders Jackson

Från: David Allsopp @.> Datum: söndag, 18 april 2021 10:19 Till: ocaml/opam-repository @.> Kopia: Anders Jackson @.>, Author @.> Ämne: Re: [ocaml/opam-repository] Fail compiling a switch when LC_ALL/LC_COLLATE is set to sv_SE.UTF-8 instead of C (#18486)

I think it would be better (because it's more explicit) to put put a patch in for @.***https://github.com/ocaml/ocaml/commit/5f90caf6e2623d55883ddd385b1dc5198dff94a3 to the old compilers, but LC_ALL=C in build-env for those packages would work, yes

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ocaml/opam-repository/issues/18486#issuecomment-821953547, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACL2RB7SGL3QATALNTSU4BTTJKIZLANCNFSM42YXOZ5Q.

[Högskolan i Gävle]

Högskolan i Gävle, 801 76 Gävle • 026 64 85 00 • www.hig.sehttps://hig.se

University of Gävle, SE-801 76 Gävle, Sweden • +46 (0) 26 64 85 00 • www.hig.sehttps://hig.se

anders-hig-jackson avatar Oct 20 '21 13:10 anders-hig-jackson