schema_salad
schema_salad copied to clipboard
Spaces etc in enums not escaped in RDF
As pointed out in common-workflow-language/cwltool#444 using enum symbols with spaces will cause problems with generating RDF, as the spaces are not escaped.
There are also problems with generating the JSON-LD Context if the string after the space is repeated, as it tries to insert for instance table
multiple into @context
:
Exception: Predicate collision on table, 'file:///c:/users/stain/src/schema_salad/schema_salad/tests/../tests/test_schema/symbols-with-spaces.yml#table_type/OTU table' != 'file:///c:/users/stain/src/schema_salad/schema_salad/tests/../tests/test_schema/symbols-with-spaces.yml#table_type/Pathway table
It is currently not defined anywhere what characters are allowed in valid schema salad identifiers - meaning anything is currently allowed, including newline, "
and anything that can be expressed in YAML escapes.
A real fix is to restrict the characters of all short-hand identifiers, e.g. to be valid IRI fragment identifiers.
Some workflows already started using this, like in https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/blob/e14f5cd5e240ecb2563d138e9aa112abcdae3295/tools/biom-convert-table.yaml where constants like Pathway table
(with space) is passed in as argument.
A quick fix, as in 8514cb72996175aac8f749d57655dde835f0d195, will seem sufficient, but does not add required 'unquoting'. To be able to unquote we need to both have a predictable quote pattern and know all the places that needs the "pure" values. For instance the above commit breaks that workflow, as it changes the constant to Pathway_table
which the command line tool would not understand.
See the https://github.com/common-workflow-language/schema_salad/tree/escape-enums branch for my naive approach (do not merge!).
@stain re: https://github.com/common-workflow-language/schema_salad/compare/escape-enums#diff-ceb29e0b30c435fbee3c927af99c9f35R105
why not escape spaces as %20
→ _20
?