glom
glom copied to clipboard
Inconsistent behaviour between attributes/keys after release 22.1.0
I use glom
with a library called frictionless
. It's a library that manages metadata for datasets based on the frictionless specification.
When using release 22.1.0, I see inconsistent behaviour between attributes/keys. Most (all?) of the metadata classes in frictionless
are subclasses of dict
. I have a hunch, the inconsistent behaviour is because of that.
Here's a minimal example using the Schema
class from frictionless
. I have put comments next to indicate expected behaviour, or to comment on the error.
from frictionless import Schema
from glom import glom, Assign
fields = [
{"name": "id", "type": "number"},
{"name": "foo", "type": "datetime"},
{"name": "bar", "type": "string"},
{"name": "baz", "type": "boolean"},
]
schema = Schema(fields=fields, missing_values="NA")
# access
assert schema.primary_key == []
assert glom(schema, "primary_key") == [] # fails with KeyError
# expected
try:
assert schema["primaryKey"]
except KeyError:
pass
# expected
try:
assert glom(schema, "primaryKey")
except KeyError:
pass
# assign
glom(schema, Assign("primary_key", "foo"))
assert schema.primary_key == ["foo"]
assert schema["primaryKey"] == "foo"
glom(schema, Assign("primaryKey", "id")) # fails with AttributeError
assert schema["primaryKey"] == "id"
Edit: I simplified the examples
Thanks for finding that, and sorry we broke your library!
This corner case wasn't covered in our tests (Path behavior on an object where both [] and . are valid).
There isn't any obvious reason for the change in behavior in the changes for this release.
I think your hunch is correct -- the behavior of strings is that they will try ["val"]
, .val
, [int("val")]
depending on the type. Before, .
probably had higher precedence than []
.
There's two fixes if you want to keep using the new version without waiting for us:
1- Use T.val
to disambiguate in the spec.
2- Use register() to disambiguate globally that frictionless types use .
not []
https://glom.readthedocs.io/en/latest/api.html#glom.register
Just for additional information, do you have any of the stack traces handy? (If the underlying error is KeyError "primarykey" that confirms the root cause is []
vs .
)
Hi @kurtbrose, sorry about my delayed response. I was trying to get more details. Thanks to your comments, I have simplified my example above. You are correct, when accessing the error is KeyError
:
Traceback (most recent call last):
File "/home/user/testing/glom-frictionless-bug/test_bug.py", line 14, in <module>
assert glom(schema, "primary_key") == [] # fails with KeyError
File "/home/user/.virtualenvs/tmp-33f0ead521b8cd1/lib/python3.9/site-packages/glom/core.py", line 2294, in glom
raise err
glom.core.PathAccessError: error raised while processing, details below.
Target-spec trace (most recent last):
- Target: {'fields': [{'name': 'id', 'type': 'number'}, {'name': 'foo', 'type': 'datetime'}, {'n... (len=2)
- Spec: 'primary_key'
glom.core.PathAccessError: could not access 'primary_key', part 0 of Path('primary_key'), got error: KeyError('primary_key')
However I don't think that is the problem. My code started failing because the precedence between key and attribute has changed for assign (as shown by the following traceback).
Traceback (most recent call last):
File "/home/user/testing/glom-frictionless-bug/test_bug.py", line 34, in <module>
glom(schema, Assign("primaryKey", "id")) # fails with AttributeError
File "/home/user/.virtualenvs/tmp-33f0ead521b8cd1/lib/python3.9/site-packages/glom/core.py", line 2294, in glom
raise err
glom.core.PathAssignError: error raised while processing, details below.
Target-spec trace (most recent last):
- Target: {'fields': [{'name': 'id', 'type': 'number'}, {'name': 'foo', 'type': 'datetime'}, {'n... (len=3)
- Spec: Assign(Path('primaryKey'), 'id')
- Spec: Path()
glom.core.PathAssignError: could not assign 'primaryKey' on object at Path(), got error: AttributeError("'Schema' object has no attribute 'primaryKey'")
In my code, I was consistently using key based access (so ["primaryKey"]
instead of .primary_key
). As Schema
is a subclass of dictionary, this seemed natural to me. The code broke (only my assigns failed), because the precedence changed to favour attributes.
And thank you so much for suggesting the two possible fixes in case it takes a while to resolve this in glom
. Much appreciated :)
Thanks, I missed that it was the assignment target that was failing.
That makes a lot more sense because we DID touch that code this release.
(we added *
and **
to paths, and you can assign to a glob/multi-path: Assign('*.primary_key', None)
instead of [Assign('primary_key', None)]
or (T.values(), [Assign('primary_key', None)]
)
Thanks for the simplified reproduction, I was able to grab it pretty directly into a test.
I've tried running the test against different releases, all the way back to 18.4.0 and they all fail in the same place.
# access
assert schema.primary_key == []
> assert glom(schema, "primary_key") == [] # fails with KeyError
glom\test\test_regressions.py:26:
# ... snip out a bunch of glom internal lines ...
except Exception as e:
> raise PathAccessError(e, Path(_t), i // 2)
E glom.core.PathAccessError: could not access 'primary_key', part 0 of Path('primary_key'), got error: KeyError('primary_key')
glom\core.py:894: PathAccessError
Maybe the version you were using was older than that?
I think my description of the issue hasn't been very clear, my fault. I'll try to be clearer.
Since a dict
subclass is primarily meant for key-based lookup, I have been using string paths with glom
only for keys.
from frictionless import Schema
from glom import glom, Assign
fields = [
{"name": "id", "type": "number"},
{"name": "foo", "type": "datetime"},
{"name": "bar", "type": "string"},
{"name": "baz", "type": "boolean"},
]
schema = Schema(fields=fields, missing_values="NA")
# expected
try:
assert schema["primaryKey"]
except KeyError:
pass
# expected
try:
assert glom(schema, "primaryKey")
except KeyError:
pass
The above works as expected (older versions, and latest release).
However assignment used to work with 20.11, but doesn't work with 22.1.
schema["primaryKey"] = "foo"
assert schema["primaryKey"] == "foo"
glom(schema, Assign("primaryKey", "id")) # AttributeError with 22.1
assert schema["primaryKey"] == "id"
While investigating the above, I realised the following also fails (for all versions) and included it in my minimal example.
glom(schema, "primary_key") # KeyError for both versions
However thanks to your first comment I see it is unsurprising, and maybe even expected, as Schema
is a subclass of dict
. I don't know the glom
internals, but I would guess while it might be possible to make string paths work for attribute access for a dict
subclass, it isn't possible to implement assignment. Either way, I wouldn't consider support for this important, as there are alternatives (as suggested in your first response, thanks!)