strictyaml
strictyaml copied to clipboard
Support request: How to formulate a schema containing partially exclusive or without Or?
I've a config which is partially exclusive OR. It may be either like this
common:
- ...
variant_a:
- ...
variant_a_again:
- ...
or
common:
- ...
variant_b:
- ...
variant_b_again:
- ...
Using a schema like schema = Map({"common": Seq(...), "variant_a": Seq(...), "variant_a_again": Seq(...)}) | Map({"common": Seq(...), "variant_b": Seq(...), "variant_b_again": Seq(...)})
should work. However I get this error strictyaml.exceptions.InvalidValidatorError: You tried to Or ('|') together 2 Map validators. Try using revalidation instead.
which prevents me from using it. To me it's not clear how to do revalidation here. How do I have to use revalidation in this case?
It should be possible. I'll try to write some code that will do this in about an hour or so.
On Wed, 29 Apr 2020, 15:45 Florian Kromer, [email protected] wrote:
I've a config which is partially exclusive OR. It may be either like this
common:
- ... variant_a:
- ... variant_a_again:
- ...
or
common:
- ... variant_b:
- ... variant_b_again:
- ...
Using a schema like schema = Map({"common": Seq(...), "variant_a": Seq(...), "variant_a_again": Seq(...)}) | Map({"common": Seq(...), "variant_b": Seq(...), "variant_b_again": Seq(...)}) should work. However I get this error strictyaml.exceptions.InvalidValidatorError: You tried to Or ('|') together 2 Map validators. Try using revalidation instead. which prevents me from using it. To me it's not clear how to do revalidation here. How do I have to use revalidation in this case?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crdoconnor/strictyaml/issues/99, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOJKNPUNLLEVKX2O4FUAF3RPA4R5ANCNFSM4MTZDSCQ .
@crdoconnor It's not time critical cause I have a (super ugly) workaround. Thx a lot.
I've just stumbled upon this use case as well.
@crdoconnor Did you come up with a solution for this?
Yes, sorry I didn't get around to writing it and I got distracted. I'll do so today.
On Tue, 9 Jun 2020, 13:53 Chris Burr, [email protected] wrote:
I've just stumbled upon this use case as well.
@crdoconnor https://github.com/crdoconnor Did you come up with a solution for this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crdoconnor/strictyaml/issues/99#issuecomment-641271491, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOJKNMJGZOYRF7TOGTVDTDRVYWFTANCNFSM4MTZDSCQ .
It is possible with .revalidate I think.
On Tue, 9 Jun 2020, 13:54 Colm O'Connor, [email protected] wrote:
Yes, sorry I didn't get around to writing it and I got distracted. I'll do so today.
On Tue, 9 Jun 2020, 13:53 Chris Burr, [email protected] wrote:
I've just stumbled upon this use case as well.
@crdoconnor https://github.com/crdoconnor Did you come up with a solution for this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crdoconnor/strictyaml/issues/99#issuecomment-641271491, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOJKNMJGZOYRF7TOGTVDTDRVYWFTANCNFSM4MTZDSCQ .
@chrisburr I use 2 schemas with a nested try except. As I said... really ugly. But it works :smile:
Thanks @crdoconnor, it happens to us all 😆 (btw I'm loving strictyaml
)
@fkromer Beautiful! 😉 Unfortunately mines a little messier as the map is the key of another map:
first_thing:
input:
url: http://example.com # Str()
second_thing:
input:
id: 1234 # Int()
third_thing:
input:
job: first_thing # Str() corresponding to one of the other top-level keys
I can workaround it by setting input
to be a Map()
and then looping over the keys to check which key exists so I can revalidate
with the corresponding schema but it feels overly complicated.
@fkromer if I'm understanding you correctly, you an simply use a MapPattern:
semi_validated = load(my_yaml, MapPattern(Str(): Seq(Str())))
and then
if "variant_a" in my_yaml:
semi_validated.revalidate(Map({"
common": Seq(Str()), "variant_a": Seq(Str()), "variant_a_again": Seq(Str())
}))
if "variant_b" in my_yaml:
semi_validated.revalidate(Map({"
common": Seq(Str()), "variant_b": Seq(Str()), "variant_b_again": Seq(Str())
}))
I can workaround it by setting input to be a Map() and then looping over the keys to check which key exists so I can revalidate with the corresponding schema but it feels overly complicated.
@chrisburr Is that bad? I have a couple of situations like this and that's the way I do it and I kind of like it. I can't really think of another way to do it that's simpler.
The whole "Map({...}) | Map({...}) | Map({...})" approach I'm not really keen on since there's no way to know which one the user was necessarily intending to match, so what error should it display? If the 4th out of 6 would have matched were it not for a mis-spelled key, then showing the error message that it didn't match the last map (current behavior) isn't very useful.
It's possible that there's a new kind of Map that I could create that would solve your problem. If you can dream up one and can imagine how to construct it (i.e. what parameters it might take), I'd like to see it. I think for a lot of scenarios revalidate is going to be necessary, but there might be validators I could bundle that could make it unnecessary.
Thanks for the kudos btw :)
@fkromer if I'm understanding you correctly, you an simply use a MapPattern:
I'm using something like this
schema_var_a = Map(...)
schema_var_b = Map(...)
try:
config = load(..., schema_var_a).data
except YAMLError as error:
try:
config = load(..., schema_var_b).data
except YAMLError as error:
...
right now. Of course this is really ugly and I'm going to introduce a key which allows identification of which actual schema I want to use for validation. Could you point me at the reference for MapPattern pls? Cause there is no search functionality in the docs I could not find it in there.
Ah ok. Does my example translate well you to your use case?
On Wed, 10 Jun 2020, 08:18 Florian Kromer, [email protected] wrote:
@fkromer https://github.com/fkromer if I'm understanding you correctly, you an simply use a MapPattern:
I'm using something like this
schema_var_a = Map(...) schema_var_b = Map(...)
try: config = load(..., schema_var_a).data except YAMLError as error: try: config = load(..., schema_var_b).data except YAMLError as error: ...
right now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crdoconnor/strictyaml/issues/99#issuecomment-641784563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOJKNMMSZBLGWZRJPAOVPLRV4XVJANCNFSM4MTZDSCQ .
@fkromer I think the suggestion is something like
schema_var_a = Map(...)
schema_var_b = Map(...)
config = load(my_yaml, MapPattern(Str(), Any()))
if "variant_a" in config:
config = config.revalidate(schema_var_a)
elif "variant_b" in config:
config = config.revalidate(schema_var_b)
else:
raise ValueError(my_yaml)
config = config.data
The whole "Map({...}) | Map({...}) | Map({...})" approach I'm not really keen on since there's no way to know which one the user was necessarily intending to match, so what error should it display? If the 4th out of 6 would have matched were it not for a mis-spelled key, then showing the error message that it didn't match the last map (current behavior) isn't very useful.
I think I would expect the error to come from the OrValidator
with something like: failed to find a valid schema for the value starting at...
.
Example:
In my case the schema could be one of these three:
1. Currently possible solution
schema = MapPattern(Str(), Map({
"name": Str(),
"input": Map(Str(), Any()),
}))
data = load(my_yaml, schema)
for job_name, job_data in data.items():
if "url" in job_data["input"]:
job_data["input"].revalidate(Map({
"url": Str(),
}))
elif "id" in job_data["input"]:
job_data["input"].revalidate(Map({
"id": Int(),
}))
elif "job" in job_data["input"]:
job_data["input"].revalidate(Map({
"job": Regex("(" + "|".join(map(re.escape, data.data.keys())),
}))
else:
raise ValueError(f"No valid input type found for {job_name}")
2. Using an OrValidator
with Map
data = load(my_yaml, MapPattern(Str(): Any()))
input_data_schema = Map({
"url": Str(),
}) | Map({
"id": Int(),
}) | Map({
"job": Regex("(" + "|".join(map(re.escape, data.data.keys())),
})
full_schema = MapPattern(Str(), Map({
"name": Str(),
"input": input_data_schema,
}))
data.revalidate(full_schema)
3. After writing the first two I realised this could also be done using a custom validator class
I started writing a custom validator here but deleted it as it become quite long to do correctly for both validate
and to_yaml
.
Summary thoughts
I still prefer option (2). I can easily imagine option (1) becoming much more complicated if input_data_schema
could appear in more places or at multiple levels in a deeper structure. If there are problems with (2) then (3) is an acceptable fallback.
First of all, thanks a lot for StrictYAML; it's great!
An OrValidator
with Map
would help in situations where a single schema is needed, and one cannot have additional code invoking revalidate
.
For example, in a project I need to convert simple Pydantic models to StrictYAML schemas (automatically); when the Pydantic model uses a union type Union[X,Y]
I would like to output yaml_schema(X) | yaml_schema(Y)
(yaml_schema
generates a schema recursively).
When both X and Y are maps, this approach doesn't work. I read in #51 that map alternatives would create problems due to stateful parsing, but they would be convenient.
Thanks @paolieri
I understand your use case and I'll bear it in mind. Unfortunately I don't see any clear and obvious way of implementing this given the way that strictyaml is currently parsed. I tried this a year ago and all I ended up doing was making the parser dog slow.
I'm also concerned that orring together a bunch of mapping schemas might create unreadable error messages. If you had, say, 11 different mappings each with 10 keys, displaying an error message that explains that the intended mapping number 4 would have been matched were it not for the key 'xyz' being a boolean would be a challenge, to say the least. This is another reason why I figured revalidation was a cleaner approach.
While generating a strictyaml validator directly from pydantic would probably not be possible given this issue, there might be a way to work around it and maybe create a strictyaml parser function (that does revalidation). I'd be happy to try and help solve your specific problem if you can share details in a new ticket.