metafacture-core
metafacture-core copied to clipboard
Core package of the Metafacture tool suite for metadata processing.
At the moment you only can set JAVA embedded MF Workflows to debug mode. But it would be great to enable the debug mode for the CLI Version too. Thinking...
e.g. `handle-generic-xml` OR `decode-xml` are deleting linebreaks and other signs (?) was: https://metafacture.org/playground/?flux=%22https%3A//repository.tugraz.at/oai2d%22%0A%7C+open-oaipmh%28metadataPrefix%3D%22lom%22%29%0A%7C+as-lines%0A%7C+print%0A%3B%0A becomes: https://metafacture.org/playground/?flux=%22https%3A//repository.tugraz.at/oai2d%22%0A%7C+open-oaipmh%28metadataPrefix%3D%22lom%22%29%0A%7C+decode-xml%0A%7C+handle-generic-xml%28emitNamespace%3D%22true%22%2Cattributemarker%3D%22@%22%2Cvaluetagname%3D%22%C2%A7%22%29%0A%7C+encode-xml%28attributemarker%3D%22@%22%2Cvaluetag%3D%22%C2%A7%22%29%0A%7C+print%0A%3B%0A This should be documented
At the moment `handle-picaxml` only can handle [ppxml](http://format.gbv.de/pica/ppxml): https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-picaxml%0A%7C+encode-json%28prettyPrinting%3D%22true%22%29%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF-8%22%3F%3E%0A%3Crecord+xmlns%3D%22http%3A//www.oclcpica.org/xmlns/ppxml-1.0%22%3E%0A++%3Cglobal+opacflag%3D%22%22+status%3D%22%22%3E%0A++++%3Ctag+id%3D%22003@%22+occ%3D%22%22%3E%0A++++++%3Csubf+id%3D%220%22%3E12345X%3C/subf%3E%0A++++%3C/tag%3E%0A++++%3Ctag+id%3D%22021A%22+occ%3D%22%22%3E%0A++++++%3Csubf+id%3D%22a%22%3EEin+Buch%3C/subf%3E%0A++++++%3Csubf+id%3D%22h%22%3Ezum+Lesen%3C/subf%3E%0A++++%3C/tag%3E%0A++++%3Ctag+id%3D%22045B%22+occ%3D%222%22%3E%0A++++++%3Csubf+id%3D%22a%22%3ESpo+1025%3C/subf%3E%0A++++++%3Csubf+id%3D%22a%22%3EBID+200%3C/subf%3E%0A++++%3C/tag%3E%0A++%3C/global%3E%0A%3C/record%3E But cannot handle [pica/xml](http://format.gbv.de/pica/xml): https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-picaxml%0A%7C+encode-json%28prettyPrinting%3D%22true%22%29%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF-8%22%3F%3E%0A++%3Crecord+xmlns%3D%22info%3Asrw/schema/5/picaXML-v1.0%22+xmlns%3Axsi%3D%22http%3A//www.w3.org/2001/XMLSchema-instance%22+xsi%3AschemaLocation%3D%22info%3Asrw/schema/5/picaXML-v1.0+http%3A//www.oclcpica.org/xml/picaplus.xsd%22%3E%0A++++++%3Cdatafield+tag%3D%22001@%22%3E%0A++++++%3Csubfield+code%3D%220%22%3E0917%3A14-03-05%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22001B%22%3E%0A++++++%3Csubfield+code%3D%220%22%3E0917%3A23-03-05%3C/subfield%3E%0A++++++%3Csubfield+code%3D%22t%22%3E16%3A15%3A13.000%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22001D%22%3E%0A++++++%3Csubfield+code%3D%220%22%3E0917%3A23-03-05%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22001X%22%3E%0A++++++%3Csubfield+code%3D%220%22%3E0%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22002@%22%3E%0A++++++%3Csubfield+code%3D%220%22%3EAau%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22003@%22%3E%0A++++++%3Csubfield+code%3D%220%22%3E481592954%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22004A%22%3E%0A++++++%3Csubfield+code%3D%220%22%3E3774250936%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22011@%22%3E%0A++++++%3Csubfield+code%3D%22a%22%3E2004%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22021A%22%3E%0A++++++%3Csubfield+code%3D%22a%22%3EDer+Hamster%3C/subfield%3E%0A++++++%3Csubfield+code%3D%22d%22%3Eartgerecht+halten%2C+gesund+ern%C3%A4hren%2C+richtig+verstehen%3C/subfield%3E%0A++++++%3Csubfield+code%3D%22h%22%3EPeter+Hollmann%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22028A%22%3E%0A++++++%3Csubfield+code%3D%22d%22%3EPeter%3C/subfield%3E%0A++++++%3Csubfield+code%3D%22a%22%3EHollmann%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22032@%22%3E%0A++++++%3Csubfield+code%3D%22a%22%3E5.+Aufl%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22033A%22%3E%0A++++++%3Csubfield+code%3D%22p%22%3EM%C3%BCnchen%3C/subfield%3E%0A++++++%3Csubfield+code%3D%22n%22%3EGr%C3%A4fe+und+Unzer%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22034D%22%3E%0A++++++%3Csubfield+code%3D%22a%22%3E127+S%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22034M%22%3E%0A++++++%3Csubfield+code%3D%22a%22%3Ezahlr.+Ill%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22036E%22%3E%0A++++++%3Csubfield+code%3D%22a%22%3EMein+Heimtier%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22044K%22%3E%0A++++++%3Csubfield+code%3D%22a%22%3ERatgeber%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22044L%22%3E%0A++++++%3Csubfield+code%3D%22S%22%3E+%3C/subfield%3E%0A++++++%3Csubfield+code%3D%22a%22%3ERatgeber%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22044L%22+occurrence%3D%2201%22%3E%0A++++++%3Csubfield+code%3D%22S%22%3E+%3C/subfield%3E%0A++++++%3Csubfield+code%3D%22a%22%3EHamsterhaltung%3C/subfield%3E%0A++++%3C/datafield%3E%0A++++%3Cdatafield+tag%3D%22045B%22%3E%0A++++++%3Csubfield+code%3D%22a%22%3EXbp+3%3C/subfield%3E%0A++++%3C/datafield%3E%0A%3C/record%3E
https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-lines%0A%7Cdecode-json%0A%7Cencode-xml%28rootTag%3D%22collection%22%29%0A%7Cprint%0A%3B&data=%7B%22name%22%3A+%22Open+Educational+Resources+-+\u000Beine+kritische+Einf%C3%BChrung%22%7D When special unicode characters are part of the transformed metadata encode-xml and encode-marcXml might transform the metadata to invalid xml. There should be the possibility to create valid xml...
Resolves #495
By configuring the CSVReader with an RFC-compliant parser the escaping is fixed. - update opencsv dependency to version 5.9 - add test Kudos https://stackoverflow.com/questions/6008395/opencsv-in-java-ignores-backslash-in-a-field-value. Resolves #496.
Came up in the MF-Workshop at TH Köln 2024-10-08, inspired by C. Marutschke's idea to let an LLM train using `flux` and `fix`: Provide an "opener" that scans an input...
@dr0i could you help me with this so I could move on?