More false negatives
Was experimenting a bit with puremagic. Unfortunately already the first two tests did not work (but file did it's job). grib might just be missing, but H5 should be detected, or?
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/test/test.mz5
python -m puremagic lib/galaxy/datatypes/test/test.mz5
'lib/galaxy/datatypes/test/test.mz5' : could not be Identified
file lib/galaxy/datatypes/test/test.mz5
lib/galaxy/datatypes/test/test.mz5: Hierarchical Data Format (version 5) data
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/test/test.grib
python -m puremagic lib/galaxy/datatypes/test/test.grib
'lib/galaxy/datatypes/test/test.grib' : could not be Identified
file lib/galaxy/datatypes/test/test.grib
Gridded binary (GRIB) version 1
Thanks for reporting, never heard of either of these types before!
For MZ5 I see the standard from nasa is returning 404 https://earthdata.nasa.gov/esdis/eso/standards-and-references/hdf-eos5 there is also information on it here https://docs.ogc.org/is/18-043r3/18-043r3.html but no mention of magic numbers.
Opening the file itself, starts with ‰HDF so can probably use that with low accuracy. Do you have any more examples of these file types I could look through?
Pulled down that repo and looked in the folder with the example files. Compared to file there are 25 file types that puremagic does not have matches for, removing ones from file that are only reported as ASCII, data, or very short file.
.h5
.model
.biom2
.cool
.grib
.mcool
.vcf
.sam
.loom
.h5ad
.h5mlm
.nii2
.gpr
.npy
.rma6
.cel
.bcf_uncompressed
.mztab2
.parquet
.ptkscmp
.iqtree
.mz5
.fcs
.hdt
.gal
I will start looking into each of those and seeing if they have magic numbers associated with them we can add to pure magic.
Thank you for raising this issue, and supplying the great source of example files!