edn_format icon indicating copy to clipboard operation
edn_format copied to clipboard

Load edn as a Python dict with string keys

Open piotr-yuxuan opened this issue 3 years ago • 8 comments

Would you consider a PR to load edn form just as standard Python dict with string keys?

piotr-yuxuan avatar Jul 08 '20 17:07 piotr-yuxuan

I’ll let @swaroopch weight in but personnally I’d be interested in some option to edn_format.loads such as string_keys=True or string_keyword_keys=True that would parse all keyword keys as strings.

bfontaine avatar Jul 08 '20 18:07 bfontaine

Indeed yes that sounds reasonable.

piotr-yuxuan avatar Jul 09 '20 12:07 piotr-yuxuan

Agree with @bfontaine ! Thanks @piotr-yuxuan :-)

swaroopch avatar Jul 16 '20 01:07 swaroopch

I see two issues here:

  • Currently Keyword('a/b').name => 'a/b'. I'm sure it's kind of a Chersterton's fence and I'm not yet aware of why it has been built that way, but in Clojure (name :a/b) => "b". This precludes us to use a 'stringify' strategy which would return only simple names.
  • The way namespaced maps are done in the parser, and not in the lexer, further complicates any attempt to stringify keys. It's entirely possible that I'm wrong, but perhaps another way is possible by maintaining the map namespace in the lexer state (cf. here). This would seem more general to me, and less hackish. However, as I don't have yet the bigger picture, it's entirely possible that I miss a more important point.

These two points are very opinionated, and perhaps you will disagree with me. I would prefer to know the opinion of the maintainers about them before refactoring everything and making breaking changes ;-) If we were to move forward, I would see four possible different ways to parse an edn keyword, and two possible ways to parse a symbol:

  • Keyword
    • Default: Keyword object
    • String, :a/b -> a/b (clj namespace + clj name)
    • String, :a/b -> b (only the name, with possible shadowing)
    • String, :a/b -> :a/b (no-op)
  • Symbol
    • Default: Symbol object
    • String, '(fn [a] a) -> "(fn [a] a)"

Methinks a naive implementation for Keyword which store the parsing strategy as a lexer state would be:

@ply.lex.TOKEN(KEYWORD)
def t_KEYWORD(t):
    if t.lexer.keyword_parsing == KeywordParsing.keyword_object:
        t.value = Keyword(t.value[1:])
    elif t.lexer.keyword_parsing == KeywordParsing.name_string:
        t.value = Keyword(t.value[1:]).name
    elif t.lexer.keyword_parsing == KeywordParsing.ns_slash_name_string:
        t.value = t.value[1:]
    elif t.lexer.keyword_parsing == KeywordParsing.leading_colon_string:
        pass
    return t

…which leads to the remarks above.

piotr-yuxuan avatar Jul 28 '20 18:07 piotr-yuxuan

For the impatient, here how I would iterate through what is returned by edn_format:

keyword_name = lambda keyword: \
    keyword.name.split("/", 1)[1] if "/" in keyword.name else keyword.name
strinfigy_keyword = lambda x: \
    keyword_name(x) if isinstance(x, edn_format.Keyword) else x


def process_edn(x):
    if isinstance(x, edn_format.ImmutableDict):
        return {strinfigy_keyword(k): process_edn(v) for k, v in x.items()}
    elif isinstance(x, edn_format.ImmutableList):
        return [process_edn(v) for v in x]
    else:
        return strinfigy_keyword(x)

Obviously keyword_name can be changed for any other callable.

piotr-yuxuan avatar Jul 28 '20 22:07 piotr-yuxuan

Hello, sorry for the late response.

That’s interesting; I was thinking of post-processing the result of the parsing like you did in your second comment; not of doing that during the lexing/parsing.

Currently Keyword('a/b').name => 'a/b'

Yes, it has bit me before. I don’t know the reason behind this. Changing this would be a breaking change, though.

The way namespaced maps are done in the parser, and not in the lexer, further complicates any attempt to stringify keys.

I don’t understand how you would do that in the lexer; its job is to tokenize the input, not to make sense of those tokens. The lexer doesn’t know what a map looks like: in #:foo {:a 1}, #foo, {, :a, 1, and } are different tokens.

bfontaine avatar Aug 05 '20 08:08 bfontaine

Just thought it was worth mentioning that you can create a tag with @piotr-yuxuan solution. Which you can add at the top of your edn file if you don't want to actively call process_edn after parsing. Something like:

edn.add_tag('python', process_edn)

and then you can:

#python
{:tags
 [{:name "foo"},
  {:name "bar"} ]}

mohkale avatar Aug 24 '20 14:08 mohkale

Starting from @piotr-yuxuan 's suggestion I used

def edn_to_map(x):
    if isinstance(x, edn_format.ImmutableDict):
        return {edn_to_map(k): edn_to_map(v) for k, v in x.items()}
    elif isinstance(x, edn_format.ImmutableList):
        return [edn_to_map(v) for v in x]
    elif isinstance(x, edn_format.Keyword):
        return x.name
    else:
        return x

it's incomplete but it seems ok so far in my case (parsing simple edns to python dicts without keywords), so posting it here in case that's helpful for someone else.

tzzh avatar Dec 22 '20 16:12 tzzh