textX
textX copied to clipboard
TEP-001: Reference resolving support in the textX language
This is an initial proposal for making an extensions to textX language to support resolving/scoping definition declaratively in the grammar.
textX support automatic reference resolving by referencing grammar rule inside []. Up until version ... the reference mechanism was very simplistic. The object was searched in the global model scope by the matched name and the referenced type.
Starting from version 1.7.0, a new elaborate and flexible way to specify scoping rules was introduced thanks to the work done by @goto40. I think it is a very important part of language specification so I propose a change to textX grammar language to enable specification of scoping/ref. resolving rules in the grammar. The in-code mechanism should still remain to support some more advanced specification that can't be covered by grammar based approach. The idea is to make common things simple and specific/advanced things possible.
I propose to extend the [] ref style with optional additional expression that represent the search path.
For example:
myref = [OtherRule|FQN|a.b.c]
Where a.b.c would be a path to search. In this case it would be treated as absolute path starting from model root. Relative paths could be supported by a dot prefix .a.b.c and the semantic would be to start from the object where the name has been matched and follow the path. A special keyword parent could be used to navigate to the parent object or (as already introduced in scoping), parent(TYPE) would search up the parent chain until a particular type is found.
Resolving semantics could go like this:
Take this example from #107 as an example:
Model:
kinds += GroupKind
values += [LiteralKind|FQN|.kinds.vars]
;
LiteralKind : name=ID;
GroupKind:
kindName=ID name=ID "{"
vars *= LiteralKind
"}"
;
FQN : ID'.'ID;
and the model:
Kind1 kind1 {
a b c
}
Kind2 kind2 {
a b c
}
kind1.a kind1.b kind2.a
To resolve kind1.a which is matched by FQN we take resolving specification .kinds.vars and start searching relatively from the Model instance (in this case absolute search expression kinds.vars would be equivalent). We split expression end evaluate kinds in the scope of the Model instance. We see it is an attribute of type list and then we use the first part of the matched FQN name to search by name in the list. This will result in GroupKind object kind1. We then evaluate the rest of the search expression in the scope of kind1. That would be a vars which is again a list so we use the rest of the FQN to search in this list yielding the object a of type LiteralKind which is found in the kind1 instance.
Notice that the part of FQN name is consumed only if list if found (we could extend that to dicts as well). If a regular object is found we just use it as a new scope for the rest of expression without consuming part of the name.
For backward compatibility, if search expression is not defined in [] and no in-code
defined scoping is configured we would search by name globally.
This are just some initial thoughts so I would like to hear what others think about this. Would you like to see something like this in textX? Do you have some ideas for the search expression language to make it more expressive maybe?
( I need some time to read )
We should think about renaming "RelativeName" and "ExtRelativeName" scope providers (as suggested by @igordejanovic in #109).
( I need some time to read )
Sure. This is an important stuff so we should take our time to think thoroughly before jumping into implementation.
We should maybe rework the examples in scoping.m for RelativeName (independent of the renaming activities).
If I understand your proposal correctly,
- you want to embed scoping into the grammar, and
- you would like to restrict the path to be searched for FQN scoping.
The current FQN implementation has no restrictions and uses nested named structures to search for objects (of a certain type). In the following I would like to outline the existing scoping implementations to have a better basis for discussion:
Global scoping
Both FQN and PlainName scoping implementation from textx.scoping use the natural structure of the model based on nested named elements to identify objects (one using nested names separated by dots, the other plain names).
--> I think this matches quite well what one expects from generic name lookup.
Custom scoping
Here, the reference name (typically an ID) is searched by a special function (the local scope provider). This allows, e.g., to lookup modelled methods of a modelled object of a certain modelled type. This allows also any other logic (like in https://github.com/igordejanovic/textX/blob/master/tests/functional/test_scoping/test_reference_to_nontextx_attribute.py)
The RelativeName and ExtRelativeName providers are shortcuts to common cases, where you follow the internal model structure (not the model structure based on nested named elements, but on model element compositions based on grammar-attribute-names) to find a list of named candidates..
--> Any idea for a better name? Maybe FollowModelPath instead of RelativeName.
Current integration into the meta model
We have at the moment an integration based on Rule.attribute links. It could be nice to include this into the grammar (as your proposal suggests).
Therefore, we need to clarify:
- How to define the default scoping (plain/FQN)
- How to distinguish, e.g., "scoping to allow methods of an entity" from "scoping to allow methods of an entity plus inherited methods from super-entities" e.g. like in this grammar: https://github.com/igordejanovic/textX/blob/master/tests/functional/test_scoping/components_model1/Components.tx
- slots w/o inheritance ("extends" keyword not used): test_model_with_local_scope in https://github.com/igordejanovic/textX/blob/master/tests/functional/test_scoping/test_local_scope.py
- slots with inheritance: test_inheritance_processor in https://github.com/igordejanovic/textX/blob/master/tests/functional/test_scoping/test_inheritance.py
Since, these scoping configuration is quite flexible (see,e.g., https://github.com/igordejanovic/textX/blob/master/tests/functional/test_scoping/test_reference_to_nontextx_attribute.py) it seems difficult to describe it without python code. Some special cases could be imagined to be part of the grammar description. Alternatively, we could allow to specify the python class names of the scope provider in the grammar...
I am looking forward for more details to discuss!
(sorry, closed by accident)
Note: why not using directly python code to navigate through the model (in the relativename provider)? --> because we need to handle the postponed logic.
@goto40 Thanks a lot for a detailed analysis. Sorry that I'm not so agile in response these days, been quite busy. I've started to work out the details of the reference search/resolve DSL I proposed in this issue. Please give me few days to sort things out and make some concrete examples.
Take your time.
I have put together an initial version of TEP (textX ehnacement proposal :) ) for this in the form of Wiki page as I think that issue is more for discussion but the final description should be a single consistent document.
Great! Let's refer to it as TEP-001 :)
Can I comment it directly in the Wiki. Or do I comment it here. Or do I modify the wiki?
I like the idea of the clear RREL and I think I do better understand now. The example is a good starting point. I have some questions and would like to add fictive model files...
Great! Let's refer to it as TEP-001 :)
Agreed :)
Can I comment it directly in the Wiki. Or do I comment it here. Or do I modify the wiki?
I don't see a way to comment on wiki. I guess it is best to discuss here and update the wiki when come to a conclusion/improvement.
I like the idea of the clear RREL and I think I do better understand now. The example is a good starting point. I have some questions and would like to add fictive model files...
Great! I think we should go trough all unit tests and try to find interesting references and specify RREL for it. Thus, we can see do we cover all interesting cases with the language and using this process gradually refine the language syntax and semantics. Maybe we should add these examples of grammar rules with RREL expressions at the end of the wiki page.
Some thoughts about TEP-001: For me there are three "main aspects" in a text based DSL:
- The grammar (for textx: the grammar file/string),
- scoping (for textx before TEP-001: python code / meta model code),
- validation to yield domain specific errors (for textx: python code / meta model code)
TEP-001 proposes to move parts or all of the scoping aspects from python code to the grammar file/string.
I am thinking if the RREL could also be a separate source: e.g. a string instead of a scope provider,
my_meta_model.register_scope_providers({
"*.*": scoping_providers.FQN(),
"Connection.from_port": "from_inst.component.slots" # RREL
"Connection.to_port": "from_inst.component.slots" # RREL
})
This would allow to mix RREL and custom/default scope providers seamlessly (there is one place to specify scoping). Alternative 1: if both places are allowed (grammar/code), we need to make sure there are no contradicting information sources. Alternative 2: if the grammar is the only place, where scoping is specified we have again a single place where scoping is specified (good), but we need to manage how custom classes can be introduced...
Note: FQN() differs from RREL (and RelativeName) in that way, that references are not meant to be followed normally: (they could be followed, if they are resolved so far). There is no postponed logic in the FQN. If we wish to merge both, we need to be able to specify "need postponed logic" or "no following of references" in RREL strings.
- I do not understand completely "~".
- How can I say "^{follow-named-object}*" --> this would be FQN (open point postponed logic)
- How can I say "..{follow-attribute-type-without-consuming-the-name}.{follow-named-object}" --> is this "~"?
I am thinking if the RREL could also be a separate source: e.g. a string instead of a scope provider,
Fully agree. We should allow user to override what is specified in the grammar either by RREL or by custom scoping provider. The logic could be to use the most specific match (e.g. direct attribute match over wildcards) and in the case of a conflict obey the provider specified in the source code (as we assume the user would like to override the grammar).
but we need to manage how custom classes can be introduced...
I don't see that custom classes would be affected by this. Could you explain?
Note: FQN() differs from RREL (and RelativeName) in that way, that references are not meant to be followed normally: (they could be followed, if they are resolved so far). There is no postponed logic in the FQN. If we wish to merge both, we need to be able to specify "need postponed logic" or "no following of references" in RREL strings.
Isn't that implementation specific? I would like for RREL to be fully declarative and let the resolver (who interprets the RREL language) take care of the proper order of resolution and postponing of resolving where necessary. Don't see why we should put that in the language itself. Could we do multi-pass resolving postponing what can't be resolved in this pass and repeating until all references are resolved?
How can I say "^{follow-named-object}*" --> this would be FQN (open point postponed logic)
See above. For example, if we have ^packages* as expression and name 'pac1.pac2we would evaluate by expandingpackages*topackages.packages(because we have two part name), going up the parent chain one level (because of^) and matching pac1in the parentpackagesand thanpac2in the found packagepackages`. If something could not be resolved in this run we postpone for the next run.
How can I say "..{follow-attribute-type-without-consuming-the-name}.{follow-named-object}" --> is this "~"?
Exactly. ~ is for situation when we navigate over collection but don't want to consume name part and get a specific element from a collection but would rather iterate over all element of the collection without name part consumption. The best example would be searching for a method or attribute up the inheritance hierarchy (It seems that I planed but forgot to exercise this in the TEPs example with the Component). For example extends~*.methods would search methods collection and then would try all extends collection
object methods. By giving * it would follow extends chain searching first extends.methods than extends.extends.methods etc. until finding the object or exhausting all paths and not finding anything. ~ is really not an operator but a marker applicable on a collection part of the RREL expression. Similarly I see ^ as a marker operating over RREL path expression. Both ~ and ^ markers influence the way how expression they are marking is evaluated. Maybe we should put markers always on the left side of the subexpression they are marking, e.g. ~extends*.methods?
but we need to manage how custom classes can be introduced...
I don't see that custom classes would be affected by this. Could you explain?
Sorry: I mean custom provider classes.
Ah, I think that current mechanism is good, we just provide a callable which the right signature and register on the metamodel like we do now. We won't do anything to support custom providers in the RREL language. We provide support for most common resolving. For all very specific cases users must write their own scope provider and register as they do now. In my view we should end up only with RREL and custom providers from the user point of view where custom providers are used just in very specific situations (like resolving in non-textX models).
Note: FQN() differs from RREL (and RelativeName) in that way, that references are not meant to be followed normally: (they could be followed, if they are resolved so far). There is no postponed logic in the FQN. If we wish to merge both, we need to be able to specify "need postponed logic" or "no following of references" in RREL strings.
Isn't that implementation specific? I would like for RREL to be fully declarative and let the resolver (who interprets the RREL language) take care of the proper order of resolution and postponing of resolving where necessary. Don't see why we should put that in the language itself. Could we do multi-pass resolving postponing what can't be resolved in this pass and repeating until all references are resolved?
Hm... Yes it is specific to the current FQN implementation. We may drop it...
How can I say "^{follow-named-object}*" --> this would be FQN (open point postponed logic)
See above. For example, if we have
^packages*as expression and name 'pac1.pac2we would evaluate by expandingpackages*topackages.packages(because we have two part name), going up the parent chain one level (because of^) and matchingpac1in the parentpackagesand thanpac2in the found packagepackages`. If something could not be resolved in this run we postpone for the next run.
To differentiate between different attributes we could introduce something like a "{..}" syntax, as follows
- Syntax: ('{' attribute=ID '|' (rule=FQN '.')? identifier_attribute=ID '}') | (attribute=ID)
- Semantic:
- attribute = Attribute to be followed (may be "*" to use any attribute)
- rule = attribute to be followed, must be an instance of that rule (may be None)
- identifier_attribute = attribute of the name of the attribute to be followed. Default = "name".
Examples:
- package is a shortcut for {package|name}
- "{package|name}" is the same as "package". Here we look for attributes called "package" of rules, where the package must have an attribute "name" to match the current part of the reference name to be consumed.
- "{package|id}" is the same, but we changed the attribute "id" is used instead of "name". Thus, we look for package-attributes, which in turn have an attribute "id" matching the current part to the reference to be consumed.
- "{*|name}" here we look for any ("*") named object (with attribute "name").
- "{*|Package.name}" or "{*|Person.id}" to follow all named Packages or Perseon having an id as name.
- To follow all Packages without consuming the name could be "~{*|Package.*}"... (?)
--> The standard FQN would then be: ^( {*|name}, ~__tx_loaded_models )* {*|name}
Other 'point' :
- Syntax: '.' ( separator=STRING )?
- Semantic:
- separator is used as separator in the reference text
- ("." is a shortcut for .'.')
Example: "({*|name}.'::')*{*|name}" will match C++ like FQNs ("namespace1::namespace2::variable")
I would like to setup a branch of textx where I add a unittest to implement a RREL grammar to parse RREL-strings... Then we could both commit there...
object
methods. By giving*it would followextendschain searching firstextends.methodsthanextends.extends.methodsetc. until finding the object or exhausting all paths and not finding anything.~is really not an operator but a marker applicable on a collection part of the RREL expression. Similarly I see^as a marker operating over RREL path expression. Both~and^markers influence the way how expression they are marking is evaluated. Maybe we should put markers always on the left side of the subexpression they are marking, e.g.~extends*.methods?
Is ~ only applying to "extends" or also to "methods"? I understood that "~" is applicable to individual parts of the RREL string...? E.g. if would like to "jump over a bridge without consuming parts of the name", like in the FQN-sopcing, where I jump over "importURI.__tx_loaded_models"...
Is ~ only applying to "extends" or also to "methods"?
I think ~ should apply only on part name thus extends in the previous case. In order to mark multiple path parts with ~ you must explicitly use it, e.g. ~extends*.~compartments.methods where both extends and compartments wouldn't consume part name. Sorry for not replying on your previous comments. I need to find some time to thing about it.
On the other hand ^ marks the whole path subexpression. You could have multiple ^ in a single RREL expression if sequence (,) is used as in ^some.path,^some.other.path.
Take your time. The central question is how powerful the RREL should be.
Github is using git for versioning of Wiki pages which is really nice. You can see here. So to clone wiki for textX you should do:
git clone [email protected]:igordejanovic/textX.wiki
I'm making changes to TEP-001 as we agree on things, so you can follow commits in this repo.
I was thinking about your proposal above.
I like the idea of being able to specify the attribute for the name (by default being name but having option to change it), but I don't like the proposed syntax as I think it is too complex and collide with outer syntax for link rule reference definition which uses |. I think we don't need {} and we should not use |.
I propose to use : as in e.g. packages:id.methods:name.
"{|name}" here we look for any ("") named object (with attribute "name").
If I understand correctly using * here would search in any collection in the current context for the given name. I don't think that is a good idea as we loose typing. I think we should always be strict in what collection we are searching stuff and we should check types along the way. If not we could end up with subtle bugs hard to discover and debug.
The standard FQN would then be: ^( {|name}, ~__tx_loaded_models ) {*|name}
I would avoid referencing implementation specific details (like __tx_loaded_models). I would like for RREL expressions to reference only what is defined in the grammar. I think RREL could be generally useful outside of textX (for other projects). In this case if you want to search all loaded models, we should consider that as implicitly supported.
Other 'point' :
This is indeed good idea but instead of defining it in the RREL I have another proposal. As we assume that we have part names from he name match why not make that explicit in the match itself.
For example, lets have FQN match in the grammar that matches dot separated names:
FQN[noskipws]:
parts+=ID['.']
Now, we expect that the match is either a plain string (match rule, e.g. plain ID) or common rule with attribute parts on it. This give full flexibility to specify any separator at all (maybe even to use obj. processors to construct parts collection) without making RREL more complex. This way we abstracted away how name parts are constructed, we just don't care.
E.g. C++ like namespaces would be:
Namespace[noskipws]:
parts+=ID['::']
I agree. I have some points I am not sure with the current state, but we can change details later.
We should start - as you proposed - with collecting examples from the unittests and sketch if/how the RREL applies. If this is okay for you I can start making a short table or list next week...
If this is okay for you I can start making a short table or list next week...
That would be great. Thanks! You can start a new section with examples at the end of the TEP-001 wiki page and we can polish it as we go. At the end of the process TEP wiki page will be the final accepted version we all agree on, and then we can start with the implementation.
I started an example grammar which is (I think) in line with the wiki: https://github.com/goto40/textX/tree/TEP-001-examples/examples/RREL
Have a look there...
- Once we discussed #133, I would also add such an example (which I think should be elegant with RREL).
- The only open point is possibly multi-file models...
- I could move the branch to the original textx repo, once we decide to take the examples as a base...
Looks good. I suggest to start putting these examples at the section Examples at the end of TEP-001 document itself and polish them until we are satisfied.
One correction in Call rule. Instead of method=[Method|ID|~obj.~ref.~methods, ~obj.~ref.~extension*.~methods] you should write method=[Method|ID|obj.ref.~extension*.methods].
Explanation: obj and ref are not collection, thus ~ doesn't make much sense as the name part won't be consumed anyway. extension* would expand first to nothing thus would cover obj.ref.methods as well so no need to two path expressions in sequence.
Also FQN rule should be written as:
FQN[noskipws]: parts=ID['.'];
to be aligned with the idea from the above comment.
I'll take a look at #133.
Thank you for the comments. I updated the wiki.
Thanks. Looks good. I've fixed RREL expression in example 2, Reference.ref.
We need dot after instance and this example reveals that we would need grouping of subpath elements for * operator.
Hi @goto40 FYI, I'll be working on the first stage of implementing RREL parsing and the base infrastructure for its interpretation. After that we should see how to integrate with the current scoping.