cwl-utils icon indicating copy to clipboard operation
cwl-utils copied to clipboard

schema_salad.exceptions.ValidationExceptions parsing certain data structures

Open leipzig opened this issue 5 years ago • 12 comments

import cwl_utils.parser_v1_2 as parser
parser.load_document("https://raw.githubusercontent.com/common-workflow-library/bio-cwl-tools/release/picard/picard_CreateSequenceDictionary.cwl")
  InitialWorkDirRequirement:
    listing:
      - $(inputs.REFERENCE)
outputs:
  ...
  sequences_with_dictionary:
    type: File
    format: edam:format_2573  # SAM
    secondaryFiles:
      - .dict
      - .fai
parser.load_document("https://raw.githubusercontent.com/common-workflow-library/bio-cwl-tools/release/bamtools/bamtools_stats.cwl")
bamtools_stats.cwl:110:5:  the `outputBinding` field is not valid...
the `outputEval` field is not valid...

leipzig avatar Sep 04 '20 17:09 leipzig

secondaryFiles within outputs is not terribly common but listing should accept that expression https://github.com/common-workflow-language/schema_salad/blob/main/schema_salad/tests/test_schema/CommandLineTool.yml#L776-L781

leipzig avatar Sep 06 '20 14:09 leipzig

Probably not related, but you are testing with a v1.2 parser even though document declares cwlVersion: 1.2

stain avatar Sep 08 '20 14:09 stain

@leipzig so what's actually happening here is that codegen was never updated for the schema salad v1.1, which added a few features that are used for CWL 1.1 and 1.2. Specifically:

  1. secondaryFilesDSL
  2. Special treatment of the Expression type
  3. subscopes (used for assigning identifiers)
  4. default values

The first two are causing the problems you're reporting. Some suggestions:

  1. You need to add a case for secondaryFilesDSL, see https://github.com/common-workflow-language/schema_salad/blob/main/schema_salad/codegen.py#L113 and https://www.commonwl.org/v1.1/SchemaSalad.html#Domain_Specific_Language_for_secondary_files
  2. You need to add a special case for the Expression type https://github.com/common-workflow-language/schema_salad/blob/af9842edc3e55723bfd0c50562ab44c7daa35226/schema_salad/python_codegen.py#L285 and add an _ExpressionLoader which should pretty much just check if the value is a string
  3. declare_field https://github.com/common-workflow-language/schema_salad/blob/af9842edc3e55723bfd0c50562ab44c7daa35226/schema_salad/python_codegen.py#L368 needs an extra subscope parameter, if the subscope is non-empty, then it gets appended to baseuri when loading the field
  4. begin_class should be extended to so that fields with default or optional are sorted to the end of the __init__ parameters and assigned None, and then in the body of __init__ any fields with default that are None should be assigned the default value.

tetron avatar Sep 08 '20 17:09 tetron

What you should do is make a PR against schema-salad that fixes these issues, and a PR against cwl-utils with updated v1.1 and v1.2 parsers and test cases of previously-failing v1.1 and v1.2 files.

tetron avatar Sep 08 '20 17:09 tetron

@tetron I am a bit confused about number 3. Does it apply only to declare_id_field or declare_field as well. It seems the subscope argument is something that appears as when jsonldPredicate is a list, so it would be more relevant to declare_field

      jsonldPredicate:
        _id: "cwl:run"
        _type: "@id"
        subscope: run

but that when it's jsonldPredicate: "@id" we are talking about workflow id

leipzig avatar Sep 12 '20 21:09 leipzig

@leipzig When the predicate is @id that means that field is the identifier for the object it appears in.

A field with _type: "@id" means it is a reference to an identifier.

A subscope means that jsonldPredicate: "@id" fields get an extra path element to avoid identifier name conflicts with the parent. This is done by adding the subscope to the baseuri that is used to construct identifiers in the nested objects.

Does that help?

tetron avatar Sep 14 '20 15:09 tetron

Should I just concatenate baseuri+subscope?

leipzig avatar Sep 22 '20 19:09 leipzig

Pretty much. Here's what ref_resolver.py does

subscope = ""  # type: str
if key in loader.subscopes:
    subscope = "/" + loader.subscopes[key]
document[key], _ = loader.resolve_all(
    val, base_url + subscope, file_base=file_base, checklinks=False
)

tetron avatar Sep 22 '20 19:09 tetron

(the important part is that it also adds a slash)

tetron avatar Sep 22 '20 19:09 tetron

@leipzig Are you still working on this? Is there a branch of what you've tried already?

mr-c avatar Oct 23 '20 10:10 mr-c

@mr-c i have some progress on this branch https://github.com/leipzig/schema_salad/tree/main can you point me to a 1.2 example on hand with all of these features?

leipzig avatar Oct 23 '20 16:10 leipzig

@leipzig Huzzah!

Here are some CWL v1.2 examples. courtesy @jdidion

https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-sd-secondaryFiles.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/anon_enum_inside_array_inside_schemadef.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array_basename_and_entryname.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/docker-array-secondaryfiles.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/optional-output.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-in-secondaryFiles.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/initialworkdirrequirement-docker-out.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/secondaryfiles/rename-outputs.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array_basename.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_array.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/initialworkdir-glob-fullpath.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-out-secondaryFiles.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/secondaryfiles/rename-inputs.yml https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/params.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/params2.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/schemadef-tool.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/imported-hint.cwl

mr-c avatar Oct 23 '20 17:10 mr-c