`cabal sdist` corrupted when using Unicode
$ git clone https://github.com/hvr/-.git 無
...
$ cd 無
$ runghc Setup.hs configure
Configuring 無-0...
$ runghc Setup.hs sdist
Distribution quality warnings:
No 'category' field.
No 'description' field.
Warning: Cannot run preprocessors. Run 'configure' command first.
Building source dist for 無-0...
Source tarball created: dist/無-0.tar.gz
$ tar tf dist/無-0.tar.gz
無-0/
無-0/無.cabal
無-0/Setup.hs
無-0/LICENSE
That works as expected; however, when using cabal-install:
$ rm -rf dist/
$ cabal --version
cabal-install version 1.22.2.0
using version 1.22.2.0 of the Cabal library
$ cabal configure
Resolving dependencies...
Configuring 無-0...
$ cabal sdist
Distribution quality warnings:
No 'category' field.
No 'description' field.
Warning: Cannot run preprocessors. Run 'configure' command first.
Building source dist for 無-0...
Source tarball created: dist/無-0.tar.gz
$ tar tf dist/無-0.tar.gz
!-0/
!-0/!.cabal
!-0/Setup.hs
!-0/LICENSE
The resulting tarball has filenames with 無 replaced by ! (fwiw, 無 is UTF8-encoded as 0xe7 0x84 0xa1)
See also #2557
Once this is fixed, we should also add a regression test.
@hvr By the way, what operating system is this on, and what is your default encoding set to? (It should work correctly regardless of these facts, but knowing these may help to track it down.)
@ttuegel Ubuntu 15.04 w/ LANG=en_US.UTF-8; I'd expect this issue to be easily reproducible on any Linux distribution
Real-world example: https://github.com/snoyberg/yaml/blob/master/yaml.cabal#L23
cabal sdist packages it correctly on linux but cabal unpack unpacks it incorrectly:
$ ls -d test/resources/acc*
test/resources/accent test/resources/accenté
$ runhaskell Setup.lhs sdist
Building source dist for yaml-0.8.18.4...
Preprocessing library yaml-0.8.18.4...
Preprocessing executable 'yaml2json' for yaml-0.8.18.4...
Preprocessing executable 'json2yaml' for yaml-0.8.18.4...
Preprocessing test suite 'spec' for yaml-0.8.18.4...
Source tarball created: dist/yaml-0.8.18.4.tar.gz
$ cabal unpack dist/yaml-0.8.18.4.tar.gz
Unpacking to yaml-0.8.18.4/
$ ls -d yaml-0.8.18.4/test/resources/acc*
yaml-0.8.18.4/test/resources/accenté
$ cabal --version
cabal-install version 1.24.0.0
compiled using version 1.24.0.0 of the Cabal library
$ locale
LANG=ru_RU.UTF-8
Note how é got corrupted to é.
Edward pointed out it's even more complicated: https://github.com/haskell/cabal/issues/3758
underlying/related issue in tar https://github.com/haskell/tar/issues/6
Thanks. And there is even some traffic in the tar issue, so let me downgrade priority on our side. Unless a workaround on cabal side is urgent for any users?