pygeometa
pygeometa copied to clipboard
Batch processing of MCF files from a user-specified location
Feature request for the capability to launch pygeometa and process multiple MCF files located in a user-specified directory with a single command.
This 'batch mode' could be invoked directly from pygeometa with the --batch
argument. If --batch
is specified, then pygeometa is ran for every mcf files in the directory specified with --mfc=
and all the outputs files are named the same as input files but end with .xml and are saved in the directory specified by --output
. Other pygeometa arguments such as --schema=
are applied to all MCF files processed in batch.
An initial version of this batch mode can simply loop on all mcf files and generate the corresponding xml files. The batch mode should skip base_mcf mcf files, which can't be processed on their own.
Future versions could:
- Recursively look in the
--mfc=
folder to process .mcf files located in subfolders as well and output the same directory structure to--output=
- Provide a log with warnings, errors and outputs. The log location and named could be specified by the
--log=
argument - The batch mode could also look in the output directly if a corresponding xml exists and skip running if the input mcf file has not change since last time the batch mode was ran. The last time the batch mode was ran could be specified in the log file found at the
--log=
location
Thoughts / comments?
Regarding the possibility of skipping the generation of the XML when the MCF has not changed since last time the batch mode was ran, instead of looking for dates in the log (no logs exist at the moment), we could consider looking at the DateTime value within dateStamp, if present in the XML.
<gmd:dateStamp>
<gco:DateTime>2016-12-22T16:34:15Z</gco:DateTime>
</gmd:dateStamp>
Does this make sense? While there's multiple DateTime values in the output XML, there's only a single one within dateStamp. Not certain if this logic applies to other schema though.
I wrote a first version of the script with a --batch option. With this option, we need to provide folder for --mcf and --output. I will update the documentation and let you know once this is done.
First version of the script: https://github.com/RousseauLambertLP/pygeometa/blob/issue-63/pygeometa/core.py
I just added one use case in the README file.
https://github.com/RousseauLambertLP/pygeometa/blob/issue-63/README.md