cabal icon indicating copy to clipboard operation
cabal copied to clipboard

`cabal sdist` corrupted when using Unicode

Open hvr opened this issue 10 years ago • 7 comments

$ git clone https://github.com/hvr/-.git 無
...
$ cd 無
$ runghc Setup.hs configure
Configuring 無-0...

$ runghc Setup.hs sdist
Distribution quality warnings:
No 'category' field.
No 'description' field.
Warning: Cannot run preprocessors. Run 'configure' command first.
Building source dist for 無-0...
Source tarball created: dist/無-0.tar.gz

$ tar tf dist/無-0.tar.gz
無-0/
無-0/無.cabal
無-0/Setup.hs
無-0/LICENSE

That works as expected; however, when using cabal-install:

$ rm -rf dist/

$ cabal --version
cabal-install version 1.22.2.0
using version 1.22.2.0 of the Cabal library

$ cabal configure
Resolving dependencies...
Configuring 無-0...

$ cabal sdist
Distribution quality warnings:
No 'category' field.
No 'description' field.
Warning: Cannot run preprocessors. Run 'configure' command first.
Building source dist for 無-0...
Source tarball created: dist/無-0.tar.gz

$ tar tf dist/無-0.tar.gz 
!-0/
!-0/!.cabal
!-0/Setup.hs
!-0/LICENSE

The resulting tarball has filenames with replaced by ! (fwiw, is UTF8-encoded as 0xe7 0x84 0xa1)

See also #2557

hvr avatar Apr 27 '15 09:04 hvr

Once this is fixed, we should also add a regression test.

23Skidoo avatar Apr 27 '15 13:04 23Skidoo

@hvr By the way, what operating system is this on, and what is your default encoding set to? (It should work correctly regardless of these facts, but knowing these may help to track it down.)

ttuegel avatar Apr 27 '15 13:04 ttuegel

@ttuegel Ubuntu 15.04 w/ LANG=en_US.UTF-8; I'd expect this issue to be easily reproducible on any Linux distribution

hvr avatar Apr 27 '15 18:04 hvr

Real-world example: https://github.com/snoyberg/yaml/blob/master/yaml.cabal#L23

cabal sdist packages it correctly on linux but cabal unpack unpacks it incorrectly:

$ ls -d test/resources/acc*
test/resources/accent  test/resources/accenté
$ runhaskell Setup.lhs sdist
Building source dist for yaml-0.8.18.4...
Preprocessing library yaml-0.8.18.4...
Preprocessing executable 'yaml2json' for yaml-0.8.18.4...
Preprocessing executable 'json2yaml' for yaml-0.8.18.4...
Preprocessing test suite 'spec' for yaml-0.8.18.4...
Source tarball created: dist/yaml-0.8.18.4.tar.gz

$ cabal unpack dist/yaml-0.8.18.4.tar.gz
Unpacking to yaml-0.8.18.4/

$ ls -d yaml-0.8.18.4/test/resources/acc*
yaml-0.8.18.4/test/resources/accenté

$ cabal --version
cabal-install version 1.24.0.0
compiled using version 1.24.0.0 of the Cabal library

$ locale
LANG=ru_RU.UTF-8

Note how é got corrupted to é.

trofi avatar Sep 03 '16 09:09 trofi

Edward pointed out it's even more complicated: https://github.com/haskell/cabal/issues/3758

trofi avatar Sep 03 '16 09:09 trofi

underlying/related issue in tar https://github.com/haskell/tar/issues/6

gbaz avatar Jul 28 '22 17:07 gbaz

Thanks. And there is even some traffic in the tar issue, so let me downgrade priority on our side. Unless a workaround on cabal side is urgent for any users?

Mikolaj avatar Jul 28 '22 17:07 Mikolaj