edn_format
edn_format copied to clipboard
Load edn as a Python dict with string keys
Would you consider a PR to load edn form just as standard Python dict with string keys?
I’ll let @swaroopch weight in but personnally I’d be interested in some option to edn_format.loads
such as string_keys=True
or string_keyword_keys=True
that would parse all keyword keys as strings.
Indeed yes that sounds reasonable.
Agree with @bfontaine ! Thanks @piotr-yuxuan :-)
I see two issues here:
- Currently
Keyword('a/b').name => 'a/b'
. I'm sure it's kind of a Chersterton's fence and I'm not yet aware of why it has been built that way, but in Clojure(name :a/b) => "b"
. This precludes us to use a 'stringify' strategy which would return only simple names. - The way namespaced maps are done in the parser, and not in the lexer, further complicates any attempt to stringify keys. It's entirely possible that I'm wrong, but perhaps another way is possible by maintaining the map namespace in the lexer state (cf. here). This would seem more general to me, and less hackish. However, as I don't have yet the bigger picture, it's entirely possible that I miss a more important point.
These two points are very opinionated, and perhaps you will disagree with me. I would prefer to know the opinion of the maintainers about them before refactoring everything and making breaking changes ;-) If we were to move forward, I would see four possible different ways to parse an edn keyword, and two possible ways to parse a symbol:
- Keyword
- Default: Keyword object
- String,
:a/b
->a/b
(clj namespace + clj name) - String,
:a/b
->b
(only the name, with possible shadowing) - String,
:a/b
->:a/b
(no-op)
- Symbol
- Default: Symbol object
- String,
'(fn [a] a)
->"(fn [a] a)"
Methinks a naive implementation for Keyword which store the parsing strategy as a lexer state would be:
@ply.lex.TOKEN(KEYWORD)
def t_KEYWORD(t):
if t.lexer.keyword_parsing == KeywordParsing.keyword_object:
t.value = Keyword(t.value[1:])
elif t.lexer.keyword_parsing == KeywordParsing.name_string:
t.value = Keyword(t.value[1:]).name
elif t.lexer.keyword_parsing == KeywordParsing.ns_slash_name_string:
t.value = t.value[1:]
elif t.lexer.keyword_parsing == KeywordParsing.leading_colon_string:
pass
return t
…which leads to the remarks above.
For the impatient, here how I would iterate through what is returned by edn_format
:
keyword_name = lambda keyword: \
keyword.name.split("/", 1)[1] if "/" in keyword.name else keyword.name
strinfigy_keyword = lambda x: \
keyword_name(x) if isinstance(x, edn_format.Keyword) else x
def process_edn(x):
if isinstance(x, edn_format.ImmutableDict):
return {strinfigy_keyword(k): process_edn(v) for k, v in x.items()}
elif isinstance(x, edn_format.ImmutableList):
return [process_edn(v) for v in x]
else:
return strinfigy_keyword(x)
Obviously keyword_name can be changed for any other callable.
Hello, sorry for the late response.
That’s interesting; I was thinking of post-processing the result of the parsing like you did in your second comment; not of doing that during the lexing/parsing.
Currently
Keyword('a/b').name => 'a/b'
Yes, it has bit me before. I don’t know the reason behind this. Changing this would be a breaking change, though.
The way namespaced maps are done in the parser, and not in the lexer, further complicates any attempt to stringify keys.
I don’t understand how you would do that in the lexer; its job is to tokenize the input, not to make sense of those tokens. The lexer doesn’t know what a map looks like: in #:foo {:a 1}
, #foo
, {
, :a
, 1
, and }
are different tokens.
Just thought it was worth mentioning that you can create a tag with @piotr-yuxuan solution. Which you can add at the top of your edn file if you don't want to actively call process_edn
after parsing. Something like:
edn.add_tag('python', process_edn)
and then you can:
#python
{:tags
[{:name "foo"},
{:name "bar"} ]}
Starting from @piotr-yuxuan 's suggestion I used
def edn_to_map(x):
if isinstance(x, edn_format.ImmutableDict):
return {edn_to_map(k): edn_to_map(v) for k, v in x.items()}
elif isinstance(x, edn_format.ImmutableList):
return [edn_to_map(v) for v in x]
elif isinstance(x, edn_format.Keyword):
return x.name
else:
return x
it's incomplete but it seems ok so far in my case (parsing simple edns to python dicts without keywords), so posting it here in case that's helpful for someone else.