wdl icon indicating copy to clipboard operation
wdl copied to clipboard

Multiple identical Struct declarations

Open heathsc opened this issue 3 years ago • 11 comments

What is the correct behaviour for the parser if multiple identical Struct definitions with the same name are declared either from the same document or different documents? We can see this, for example, with several of the BioWDL workflows where some basic Structs are declared multiple times with identical declarations without using aliasing to disambiguate them. In these cases should the parser (a) raise an error, (b) raise an error only if the definitions are not identical (same member names & types in the same order) and issue a warning otherwise or (c) raise an error if the definitions are not identical and ignore the duplicate definition otherwise. I would favour (b) unless it is clear that this is allowed in which case (c).

heathsc avatar Mar 30 '21 07:03 heathsc

First pasting for reference the spec language:

It is valid to import the same Struct into the global namespace multiple times via different paths; however, if two Structs with the same name but different members are imported into the global namespace there is a name collision resulting in an error. This means that care must be taken not to give identical names to two different Structs that might be imported into the same WDL document tree. Alternatively, Structs can be aliased at import time.

There are two slightly different scenarios to think about. First is the "diamond import" pattern (which I think we do see in BioWDL among others), where W.wdl imports the same struct from two different task libraries T1.wdl and T2.wdl, but both T1.wdl and T2.wdl originally imported it from a common struct library S.wdl. I think this pattern is common (used in BioWDL I believe) and doesn't need a warning let alone an error. Longer paths are clearly possible too, but the point is there's only one "original" struct typedef in this case.

The second scenario is where the source code of the struct typedef really appears in multiple source documents (with identical members in all respects). I do think the spec is ambiguous as to whether such struct types are conceptually identical, compatible, or incompatible. Miniwdl does consider them identical, i.e. the struct type is defined as the members, which may have one or more aliases (one being the name in the original declaration), and does not care which file they were originally declared in. I can see a warning possibly being justified in this case, but miniwdl does not generate one currently.

Also, miniwdl does not care about the order of the members.

One more subtlety to track (which I think argues for miniwdl's current approach) is that structs can contain other structs as members, possibly with aliasing, so you have to take care that struct Car { Person owner } in different files may or may not be compatible, depending on what is Person in each file.

mlin avatar Mar 30 '21 22:03 mlin

Thank you for the reply. I was considering that the order of members could have an effect on struct IO, but it is true that this would not change the way the pipeline works so the order should be ignored.

Your last point raises a issue that I hadn't considered and I will have to consider the best way to deal with this. Does the struct aliasing in an import statement affect only Struct declarations in the imported file or also Struct usages within the file? It is not clear (to me!) in the spec, but I think that both delarations and usages should be aliased.

Last (for now) question on Struct aliasing: I was assuming that the aliasing only applies to the document being imported and not to daughter documents. i.e., if A.wdl imports B.wdl that imports C.wdl, any struct aliasing in the import statement in A.wdl would only apply to B.wdl and not to C.wdl. Is that correct?

On another issue, is there a set of reference documents that can be used to test the conformance of a parser?

Thanks a lot for your help.

heathsc avatar Mar 31 '21 04:03 heathsc

(To be clear, I agree the spec is ambiguous on some of these points -- I can describe what miniwdl does now, but I don't claim that's normative and I don't know how other WDL frontends handle these cases.)

Does the struct aliasing in an import statement affect only Struct declarations in the imported file or also Struct usages within the file? It is not clear (to me!) in the spec, but I think that both delarations and usages should be aliased.

If I import "mapper.wdl" alias Index as BamIndex, then I would not expect Index to be available for me to use in the document (at least not this one).

I was assuming that the aliasing only applies to the document being imported and not to daughter documents. i.e., if A.wdl imports B.wdl that imports C.wdl, any struct aliasing in the import statement in A.wdl would only apply to B.wdl and not to C.wdl. Is that correct?

I was expecting a slightly different question like "any struct aliasing in the import statement in B.wdl would only apply to B.wdl and not to A.wdl`. Was that what you meant? If so, miniwdl does import the aliases defined in B.wdl into A.wdl, but I don't have a strong conviction that it's right to do so.

mlin avatar Mar 31 '21 11:03 mlin

I don't think these cases should occur very often, but in the interests of reproducibility there should be a clear 'right answer', so it would be nice if future versions of the spec could clarify this.

For my questions I think some examples might make them clearer

Scenario 1: If I have the following declarations in a single document B.wdl that is imported with the statement import B.wdl alias Car as Car1

Struct Car { String Make } Struct CarRental { Car CarType }

I would then import Car as Car1 into the global namespace, and I should make sure that when we evaluate CarRental we will be able to find Car even though the name has changed. That is what I meant by 'both declarations and usages should be aliased'

Scenario 2: We have 3 documents, A.wdl, B.wdl and C.wdl.

A.wdl import B.wdl alias Car as Car1

B.wdl import C.wdl

C.wdl Struct Car { String Make }

In this case would the importing of Struct Car from C.wdl be affected by the alias clause in A.wdl so that Car gets imported into the global namespace as Car1? I imagine not, but I could imagine scenarios where this might be wanted.

Again, thank you for taking the time to answer my endless questions. I am working on implementing a frontend and I want to nail down as many of these ambiguous cases as possible.

heathsc avatar Mar 31 '21 11:03 heathsc

Maybe a bit of a side track this, but perhaps implementing an import X from Y type of syntax would aid in avoiding these types of scenarios.


Personally, I feel the way struct imports work is one of the most (if not the most) convoluted parts of WDL, due to them getting added to a "global namespace" instead of being namespaced like everything else. Another bit of relevant spec:

https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#global-scope

Global Scope

A WDL document is the top-level (or "outermost") scope. All elements defined within a document that are not nested inside other elements are in the global scope and accessible from anywhere in the document. The elements that may be in a global scope are:

  • Namespaces (via imports)
  • structs (including all structs defined in the document and in any imported documents)
  • tasks
  • A workflow

Regarding scenario 1, I don't think technically aliasing the usage should be necessary. The structs, when imported, get added to the document's global scope/namespace under the aliased name, but the imported document shouldn't have access to that namespace, I don't think. It has it's own namespace within in which the struct still has the original name. The second Car struct, therefore, also does not exist with the global namescape of B.wdl (since it also is not imported by B.wdl and can't be imported, since that would cause a cyclical dependency).

And for scenario 2, I never even thought about imported structs getting propagated when they are imported in an imported file... I would actually say that in scenario 2 A.wdl should be invalid, since there is no Car struct defined in B.wdl, even if it is part of it's global scope following the import. But I'm being quite pedantic here (see the exact wording in the quoted bit of spec above), I suppose.

DavyCats avatar Mar 31 '21 13:03 DavyCats

Thanks for your comments - I think I am overcomplicating the matter unnecessarily. I agree that scenario 1 should work without explicitly requiring aliasing the usage, and for scenario 2 I will only apply the aliasing to the file that is imported (so B.wdl in this case and not C.wdl). I also agree that in the case where the aliased struct is not present in the imported document (as in scenario 2) this should probably get an error or at least a warning as the user's intention of aliasing can not be carried out if the struct is not present.

heathsc avatar Mar 31 '21 13:03 heathsc

@mlin FWIW wdlTools has exactly the same behavior for structs as you describe in miniwdl.

According to the v1.1 spec: "When a Struct is imported with an alias, it is added to the global namespace only under that alias." So, when importing a document, all of its structs are imported, either under their original name or their alias.

jdidion avatar May 26 '21 21:05 jdidion

@mlin @heathsc do you think there is any clarification that is necessary in the spec? If so, let's discuss here and one of us can make a PR.

jdidion avatar May 26 '21 21:05 jdidion

I think it remains ambiguous whether two struct types with distinct declarations but identical members are compatible or not, ie

struct A {
  Int x
  String s
}

struct B {
  Int x
  String s
}

workflow w {
  input {
    A a
  }
  B b = a  # allowed???
}

This single-file case isn't very interesting, but consider if we have numerous struct types imported through a hierarchy of files, with different aliases in them, it can get very confusing to keep track of where the 'original' definitions were...giving a certain appeal to deciding just based on whether the struct members are the same.

mlin avatar May 26 '21 22:05 mlin

I honestly think that aliasing was one of my poorer decisions. If we could retcon the struct change I would probably allow proper namespacing in the identifiers. This would completely resolve any ambiguity that we are running into with this whole "global" namespace. Also, I totally forget why that was the case... I believe we were trying to avoid dot identifiers in types for some reason?

So this is probably not a helpful comment, but I would be happy to sponsor a change for 2.0 that brings namespacing to structs. IF we could do it in a graceful non breaking way that would be amazing too

patmagee avatar Jul 20 '21 13:07 patmagee

I propose to add an allowed Struct to Struct coercion, where any struct can be coerced to any other as long as the following are true:

  1. They have the same numbers of members
  2. Their members' names are identical
  3. The type of each member in the source struct is coercible to the type of the member with the same name in the target struct

Will open a PR with this change for review.

jdidion avatar Apr 04 '24 22:04 jdidion