identify
identify copied to clipboard
Some .gzip files identified as plaintext AND binary
We observed a weird behavior on some gzip files.
Good example:
https://github.com/eclipse/kuksa.val/raw/0.4.2/kuksa_databroker/createbom/licensestore/Apache-2.0.txt.gz
Bad example
https://raw.githubusercontent.com/eclipse/kuksa.val/0.4.2/kuksa_databroker/createbom/licensestore/ring.LICENSE.txt.gz
Used identify version
pip install identify
Collecting identify
Downloading identify-2.5.35-py2.py3-none-any.whl.metadata (4.4 kB)
Downloading identify-2.5.35-py2.py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.9/98.9 kB 132.8 kB/s eta 0:00:00
Installing collected packages: identify
Successfully installed identify-2.5.35
The "good" file works as expected
$ identify-cli Apache-2.0.txt.gz
["binary", "file", "gzip", "non-executable"]
the "bad" one yields unexpected results
$ identify-cli ring.LICENSE.txt.gz
["binary", "file", "gzip", "non-executable", "plain-text", "text"]
For reference, file
says
$ file *.gz
Apache-2.0.txt.gz: gzip compressed data, was "Apache2.txt", last modified: Tue Nov 8 17:08:57 2022, from Unix, original size modulo 2^32 11356
ring.LICENSE.txt.gz: gzip compressed data, was "ring.LICENSE.txt", last modified: Tue Feb 14 08:21:40 2023, from Unix, original size modulo 2^32 10125
is this a bug?