fido
fido copied to clipboard
Format Identification for Digital Objects (FIDO) is a Python command-line tool to identify the file formats of digital objects. It is designed for simple integration into automated work-flows.
- added update signature parameter to control signature download version: - `-version` parameter that defaults to "latest", behaviour remains identical; - if `-version v104` is passed, then v104 signatures will...
http://fido.openpreservation.org/ needs updating and the 'last updated' note in the footer is October 201 Needs reference to OPF, Artefactual etc
Bug spotted while writing docs, the following will hang when calling `identify_stream(...)`. ```python # -*- coding: utf-8 -*- """Reference for Fido-based format identification 1. Create a byte-stream with a known...
The 0.5.x implementation appears to be IO bound. Throughput would be increased by moving file-reads to a separate thread so that they will happen in parallel with pattern matching. One...
Some ideas for improving how signature files are handled One idea is to use [Roy](https://github.com/richardlehane/siegfried/wiki/Building-a-signature-file-with-ROY) to create signature files. This would allow access to a wider range of formats since...
It would really be useful if Fido's identification could be invoked from other Python scripts. Even though this is possible, Fido's lack of an API makes this unnecessairily difficult. See...
## Dev Effort 5D ## Description - More unit tests - Testing and reporting for Travis (and possibly Jenkins) - Test corpus of well known formats - Automated test of...
## Dev Effort 1D ## Description Via @sromkey the MS-Office Open XML files in this Archivematica test data zip are being identified as `fido-fmt/{x}` in Fido: * [sample-files](https://github.com/artefactual/archivematica-sampledata/blob/0e092d588ea1043caedb16f9ba1e78d8f990e140/SampleTransfers/OfficeDocs/objects/MS-OfficeOpenXML-samples.zip ) ```bash...
These are Python repos with lots of file signatures that might not have been covered by fido, would like to see more coverage by fido - https://github.com/floyernick/fleep-py/blob/master/fleep/data.json (193 stars) -...
- added simple test, with no assert, for file identification; and - added similar for stream identification which demonstrates hang.