python-magic icon indicating copy to clipboard operation
python-magic copied to clipboard

CSV recognized as ASCII text in Debian

Open kyprifog opened this issue 4 years ago • 10 comments

cat etc/*-releases >>

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

File that is recognized as "CSV" in mac (that is clearly a csv file with csv extension) is recognized as ASCII text in Debian. Tried reinstalling libmagic-dev didn't help.

kyprifog avatar Apr 21 '20 04:04 kyprifog

What does the file command say about this file? If it says csv, can you give an exact code snippet you're using?

ahupp avatar Apr 21 '20 06:04 ahupp

root:# file csv_sample.csv
csv_sample.csv: ASCII text

Does this mean I need to install a different version of libmagic?

kyprifog avatar Apr 29 '20 18:04 kyprifog

I'm not sure if the mac uses the same file command as debian. If so, then I'd try comparing versions and see if something has changed between. This could be due to actual code changes, or a magic definition file (which usually comes along with the code)

ahupp avatar Apr 29 '20 22:04 ahupp

So I looked in the debian image i was using (I'm using debian docker image) and the magic database was empty. I went ahead and copied the database from my mac to the docker image /usr/share/misc/magic/ (not sure this will work anyway), but still got the same result. apt-get upgrade file didn't work either. I'll keep digging.

kyprifog avatar May 01 '20 16:05 kyprifog

fwiw, in debian bullyseye (not docker image) I'm running file 5.38-4, and it does recognize a CSV file.

ahupp avatar May 04 '20 04:05 ahupp

related: https://github.com/ahupp/python-magic/issues/75

kyprifog avatar Sep 01 '20 19:09 kyprifog

at their core .csv files are just ASCII text files and as such contains the same file signature.

harrystaley avatar Apr 08 '22 03:04 harrystaley

Hi

I think have similar issue, i am creating a pandas dataframe and doing a to_csv(), but i get different results i tried to create a MRC below

Ubuntu:
    Python 3.8.10 (default, Nov 22 2023, 10:22:35) 
    [GCC 9.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import magic
    >>> import pandas as pd
    >>> df = pd.DataFrame({"a": [1,2,3], "b": [2,3,4]})
    >>> df.to_csv()
    ',a,b\n0,1,2\n1,2,3\n2,3,4\n'
    >>> magic.detect_from_content(df.to_csv().encode('utf-8'))
    FileMagic(mime_type='application/csv', encoding='us-ascii', name='CSV text')

Centos:
    Python 3.8.19 (default, May 27 2024, 05:59:07) 
    [GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import magic
    >>> import pandas as pd
    >>> df = pd.DataFrame({"a": [1,2,3], "b": [2,3,4]})
    >>> df.to_csv()
    ',a,b\n0,1,2\n1,2,3\n2,3,4\n'
    >>> magic.detect_from_content(df.to_csv().encode('utf-8'))
    FileMagic(mime_type='text/plain', encoding='us-ascii', name='ASCII text')

could this be some locale issue? for ubuntu i have Ubuntu 20.04 LTS centos i used a docker image quay.io/pypa/manylinux2014_x86_64 with cp38-cp38 python-magic used in both OS 0.4.27

indiVar0508 avatar Jul 11 '24 03:07 indiVar0508