IFIscripts
IFIscripts copied to clipboard
creates PREMIS CSV implementation scripts
Not ready to merge, but sending a pull request for visibility. I added a lot of information to the readme and to the docstrings within the functions, so here's a copypaste of the docstring output as generate by pydoc
Help on module premisobjects:
NAME
premisobjects
FILE
ifigit/ifiscripts/premisobjects.py
DESCRIPTION
Creates a somewhat PREMIS compliant CSV file describing objects in a package.
A seperate script will need to be written in order to transform these
CSV files into XML.
As the flat CSV structure prevents maintaining some of the complex
relationships between units, some semantic units have been merged, for example:
relation_structural_includes is really a combination of the
relationshipType and relationshipSubType units, which each have the values:
Structural and Includes respectively.
todo:
Document identifier assignment for files and IE. Probably in events sheet?
Allow for derivation to be entered
Link with events sheet
Link mediainfo xml in /metadata to the objectCharacteristicsExtension field.
Assumptions for now: representation UUID already exists as part of the
SIP/AIP folder structure. Find a way to supply this, probably via argparse.
FUNCTIONS
file_description(source, manifest, representation_uuid)
Generate PREMIS descriptions for items and write to CSV.
find_representation_uuid(source)
This extracts the representation UUID from a directory name.
This should be moved to ififuncs as it can be used by other scripts.
get_checksum(manifest, filename)
Extracts checksum from manifest, rather than generating a fresh one.
intellectual_entity_description()
Generate PREMIS descriptions for Intellectual Entities and write to CSV.
main()
Launches all the other functions when run from the command line.
make_skeleton_csv()
Generates a CSV with PREMIS-esque headings. Currently it's just called
'cle.csv' but it will probably be called:
UUID_premisobjects.csv
and sit in the metadata directory.
representation_description(representation_uuid, item_ids)
Generate PREMIS descriptions for a representation and write to CSV.
Help on module premiscsv:
NAME
premiscsv
FILE
ifigit/ifiscripts/premiscsv.py
DESCRIPTION
Extracts preservation events from an IFI plain text log file and converts
to a CSV using the PREMIS data dictionary
FUNCTIONS
find_events(logfile)
A very hacky attempt to extract the relevant preservation events from our
log files.
main()
Launches all the other functions when run from the command line.
make_events_csv()
Generates a CSV with PREMIS-esque headings. Currently it's just called
'bla.csv' but it will probably be called:
UUID_premisevents.csv
and sit in the metadata directory.