glom
glom copied to clipboard
How is Fill supposed to operate?
This is related to #207. Basically, I don't think Fill does what I expected it to do after reading the docstring, and I also don't think it's the most useful mode in its current incarnation. Specifically, the mode it introduces is too literal. In the example in #207, I'd expect the following to be the case:
target = [{'a': 1, 'b': 1}, {'a': 2, 'b': 2}]
glom(target, [Fill(('a', 'b'))]) == [(1, 1), (2, 2)]
>>> False
In reality though, I have to mark 'a' and 'b' with Path explicitly, which is not at all obvious from the description of Fill: "A variant of the default transformation behavior; preferring to “fill” containers instead of iterating, chaining, etc.". There are no containers here, so I'd expect 'a' and 'b' to be interpreted as paths still, whilst "structural" specs get intepreted as templates for containers to create. But it seems that what it really means is "everything except Path and Spec becomes a literal". This makes the resulting spec so verbose and hard to read for what it does, I might as well not bother with Fill:
target = [{'a': 1, 'b': 1}, {'a': 2, 'b': 2}]
glom(target, [Fill((Path('a'), Path('b')))]) == [(1, 1), (2, 2)]
>>> True
Ah, Fill. The simultaneously more minimalist and more advanced templating mode.
Fill's intended solution to clean that up is to use T:
target = [{'a': 1, 'b': 1}, {'a': 2, 'b': 2}]
glom(target, [Fill((T['a'], T['b']))]) == [(1, 1), (2, 2)]
>>> True
Fill is meant to be more precise, so it pairs better with T. You only need to use Path if you want to retain the automatic transparent access to both object attributes, dict keys, and list indexes.
So, for context, glom started out as a pretty opinionated library, before growing the modes layer that allows different opinions to coexist, e.g., Fill vs Auto. So I think one design goal of modes is that it should be straightforward to derive a Fill mode that interprets strings as paths, as you've suggested, above.
There's a short doc on the topic. Quickest way I can think to do it would be to inherit (or compose a new object based on) Fill and override _fill to have the string behavior you desire, and call through to super() for the rest. What do you think?
There's a short doc on the topic. Quickest way I can think to do it would be to inherit (or compose a new object based on) Fill and override _fill to have the string behavior you desire, and call through to super() for the rest. What do you think?
Right, the doc and the changelog is where I read about Fill at first, and my understanding of "prefers to fill containers instead of iterating / chaining" did not also include changing how strings are interpreted :). I do think it'd be possible to implement a version that works more like what I expected, but I also wanted to discuss whether Fill's current behaviour is truly the best default.
Whilst T['a'] is a solution (and I did actually mention it in #207), I think treating bare strings as paths, just as in Auto, would be more useful. The primary need that isn't cleanly addressed in Auto is "how do I create output values of container types that glom interprets as chained specs?". That's what I expected Fill to do. But the question of "how do I create a literal string output value, instead of having it interpreted as a path?" is already answered by Val(), so there's no need to have Fill do that, especially when it arguably reduces the expressiveness of the container-value creation.
Of course, since Fill has been released in its current form, its behaviour will need to stay as is for backwards compatibility. But I think it'd be useful to have another stock mode that is more selective in what it changes. It could either go back to being called Template, or maybe something like Struct. Would you like me to prepare a PR?
To give some more context on Fill -- it was built around the use case of filling in configuration data structures
here's the (slightly sanitized) prototypical example
Fill({
'debug': T['debug'],
'xmlsec_binary': saml2.sigver.get_xmlsec_binary([
'/opt/local/bin', '/usr/bin/xmlsec1', '/app/vendor/xmlsec1/bin/']),
'entityid': _Format('{base_url}/metadata/'),
'description': 'IdP',
'service': {
'idp': {
'name': 'IdP',
'endpoints': {
'single_sign_on_service': [
(_Format('{base_url}/sso/post/'), saml2.BINDING_HTTP_POST),
(_Format('{base_url}/sso/redirect/'),
saml2.BINDING_HTTP_REDIRECT),
],
'single_logout_service': [
(_Format('{base_url}/slo/post/'), saml2.BINDING_HTTP_POST),
(_Format('{base_url}/sso/redirect/'),
saml2.BINDING_HTTP_REDIRECT)
]
},
'name_id_format': [
saml2.saml.NAMEID_FORMAT_EMAILADDRESS,
saml2.saml.NAMEID_FORMAT_UNSPECIFIED],
'sign_response': False,
'sign_assertion': True,
"signing_algorithm": SIG_RSA_SHA256,
"digest_algorithm": SIG_RSA_SHA256,
},
},
'metadata': {
'remote': lambda t: [ # not safe to do DB @ import so defer till format
{'url': entity} for entity in db_results()]
},
# Signing
'key_file': _Format('{cert_dir}/private.key'),
'cert_file': _Format('{cert_dir}/public.cert'),
# Encryption
'encryption_keypairs': [{
'key_file': _Format('{cert_dir}/private.key'),
'cert_file': _Format('{cert_dir}/public.cert'),
}],
'valid_for': 365 * 24,
})
in addition to strings, other non-data-structure objects like int and bool which would be an error in Auto mode will pass through to the result
just to see how it looks, here's the example with Val wrapped around all the plain values
Fill({
'debug': T['debug'],
# xml_sec needs a buildpack on HEROKU to work
'xmlsec_binary': Val(saml2.sigver.get_xmlsec_binary([
'/opt/local/bin', '/usr/bin/xmlsec1', '/app/vendor/xmlsec1/bin/'])),
'entityid': _Format('{base_url}/metadata/'),
'description': Val('EBill/Tableau IdP'),
'service': {
'idp': {
'name': Val('Ebill IdP'),
'endpoints': {
'single_sign_on_service': [
(_Format('{base_url}/sso/post/'), Val(saml2.BINDING_HTTP_POST)),
(_Format('{base_url}/sso/redirect/'),
Val(saml2.BINDING_HTTP_REDIRECT)),
],
'single_logout_service': [
(_Format('{base_url}/slo/post/'), Val(saml2.BINDING_HTTP_POST)),
(_Format('{base_url}/sso/redirect/'),
Val(saml2.BINDING_HTTP_REDIRECT))
]
},
'name_id_format': [
Val(saml2.saml.NAMEID_FORMAT_EMAILADDRESS),
Val(saml2.saml.NAMEID_FORMAT_UNSPECIFIED)],
'sign_response': Val(False),
'sign_assertion': Val(True),
"signing_algorithm": Val(SIG_RSA_SHA256),
"digest_algorithm": Val(SIG_RSA_SHA256),
},
},
'metadata': {
'remote': lambda t: [ # not safe to do DB @ import so defer till format
{'url':entity} for entity in db_results()]
},
# Signing
'key_file': _Format('{cert_dir}/private.key'),
'cert_file': _Format('{cert_dir}/public.cert'),
# Encryption
'encryption_keypairs': [{
'key_file': _Format('{cert_dir}/private.key'),
'cert_file': _Format('{cert_dir}/public.cert'),
}],
'valid_for': Val(365 * 24),
})
this is totally a matter of subjective perception, but to me when I flip the screen back and forth, that extra level of ( ... ) around long-ish values like saml2.saml.NAMEID_FORMAT_EMAILADDRESS really slows down comprehending the structure
I see. Looking at it in a long structure that's heavily nested and mostly literal, I can see why it was done this way.
One thing that occurs is maybe a way to make this behavior transparent and overridable.
Maybe we could have a "paint-by-number" Mode where you simply override the types that you want to. Fill could be a good candidate for the basis of that template -- "for low level built in python data structures, recurse; for others pass through".
I'm imagining an API something like:
Fill(overrides={str: Path}, ...)
There's this idea hanging out that, EVENTUALLY, we will need to switch to "compile" glom-specs. The compilation (glompilation heh) would convert all the specs to more concrete primitives. This would improve performance by getting rid of a layer of dispatch logic, and eventually we could push the primitive callbacks down into Cython. Once we get there, the idea would be for glom to dogfood / spiral itself up by using gloms to transform specs.
In that general direction, you could imagine a transformer (glom-macro or glomacro if you will):
def str2path(spec)
return glom(spec,
Ref('spec',
Match(Switch({
dict: {T: Ref('spec')},
list: [Ref('spec')],
str: Invoke(Path).specs(T)}))))
some challenges with this approach as-is:
- pain in the butt to handle tuple, set, frozenset
- ideally there would be a way to express this within the glom
Maybe there should be a "mirror image" of Val() -- whereas Val says "treat this as a plain value, do not glom" this other mechanism would say "run this through an extra layer of glomming"
class Macro:
def __init__(self, macro_spec, spec):
self.macro_spec, self.spec = macro_spec, spec
def glomit(self, target, scope):
spec = scope[glom](self.macro_spec, self.spec, scope)
return scope[glom](spec, target, scope)
Hmm.... it gets super ugly when I try to use it thought
Fill(Macro(Ref('macro', Match(Switch({str: Invoice(Path).specs(T), tuple: [Ref('macro')])))
expanding out the tuple and making it work properly is way too complicated -- tuple-to-list, apply macro, list-to-tuple