python-magic
python-magic copied to clipboard
CSV recognized as ASCII text in Debian
cat etc/*-releases >>
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
File that is recognized as "CSV" in mac (that is clearly a csv file with csv extension) is recognized as ASCII text in Debian. Tried reinstalling libmagic-dev didn't help.
What does the file
command say about this file? If it says csv, can you give an exact code snippet you're using?
root:# file csv_sample.csv
csv_sample.csv: ASCII text
Does this mean I need to install a different version of libmagic?
I'm not sure if the mac uses the same file command as debian. If so, then I'd try comparing versions and see if something has changed between. This could be due to actual code changes, or a magic definition file (which usually comes along with the code)
So I looked in the debian image i was using (I'm using debian docker image) and the magic database was empty. I went ahead and copied the database from my mac to the docker image /usr/share/misc/magic/ (not sure this will work anyway), but still got the same result. apt-get upgrade file
didn't work either. I'll keep digging.
fwiw, in debian bullyseye (not docker image) I'm running file 5.38-4, and it does recognize a CSV file.
related: https://github.com/ahupp/python-magic/issues/75
at their core .csv files are just ASCII text files and as such contains the same file signature.
Hi
I think have similar issue, i am creating a pandas dataframe and doing a to_csv(), but i get different results i tried to create a MRC below
Ubuntu:
Python 3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1,2,3], "b": [2,3,4]})
>>> df.to_csv()
',a,b\n0,1,2\n1,2,3\n2,3,4\n'
>>> magic.detect_from_content(df.to_csv().encode('utf-8'))
FileMagic(mime_type='application/csv', encoding='us-ascii', name='CSV text')
Centos:
Python 3.8.19 (default, May 27 2024, 05:59:07)
[GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1,2,3], "b": [2,3,4]})
>>> df.to_csv()
',a,b\n0,1,2\n1,2,3\n2,3,4\n'
>>> magic.detect_from_content(df.to_csv().encode('utf-8'))
FileMagic(mime_type='text/plain', encoding='us-ascii', name='ASCII text')
could this be some locale issue?
for ubuntu i have Ubuntu 20.04 LTS
centos i used a docker image quay.io/pypa/manylinux2014_x86_64
with cp38-cp38
python-magic used in both OS 0.4.27