hyrule
hyrule copied to clipboard
Flattening list of lists of dicts leads to surprising results
Instead of
=> (flatten [[{:one 1 :two 2}] [{:three 3 :four 4}]])
['\ufdd0:one', '\ufdd0:two', '\ufdd0:three', '\ufdd0:four']
I was expecting
=> (flatten [[{:one 1 :two 2}] [{:three 3 :four 4}]])
[{:one 1 :two 2} {:three 3 :four 4}]
Currently, flatten
uses coll?
to decide whether to flatten something. This example constitutes a good argument that coll?
isn't the right criterion. But it's not clear what we should use instead. Checking for the existence of __getitem__
is perhaps the most obvious choice, but this wouldn't flatten sets and frozensets, which we probably do want to flatten.
By the way, if you just want to concatenate one level of iterables in x
, without recursive flattening, you can use (reduce + x)
or (sum x [])
.
To clarify, @tuturto, did you expect [{:one 1 :two 2} {:three 3 :four 4}]
because you forgot that flatten
was recursive, or because you thought it would leave dictionaries in particular untouched?
The latter one, I expected dictionaries be left as they were. And even if dictionaries were flattened, we shouldn't be discarding half of the information (keeping only keys and discarding values) while doing so.
Don't use sum
like that.
This function is intended specifically for use with numeric values and may reject non-numeric types.
It's a mere implementation detail that this works at all. A type checker has every right to flag that use, and other Python implementations (even later CPython implementations) might not support it.
Yes, it's allegedly an implementation detail, but it's an obvious desideratum. I doubt it will ever be removed.
In Python, dicts are iterables. Their iterator returns the keys. In a for loop, this is more useful, since you can still use the keys to get the values via lookup. If you want the pairs, you have to use .items
.
Clojure arguably handles this more consistently.
user=> (seq {1 2, 3 4})
([1 2] [3 4])
The default iterator is like Python's .items
.
So how does Clojure flatten maps?
user=> (flatten {1 2, 3 4})
()
user=> (doc flatten)
-------------------------
clojure.core/flatten
([x])
Takes any nested combination of sequential things (lists, vectors,
etc.) and returns their contents as a single, flat sequence.
(flatten nil) returns an empty sequence.
nil
user=> (flatten (seq {1 2, 3 4}))
(1 2 3 4)
It doesn't.
Flatten should support third-party iterables too. How can you treat them consistently if you only make dicts a special case? This is how Python is. How do you want this to work?
It looks like Clojure keeps maps in one piece when they're not the top level.
user=> (flatten [10 20 [{1 2, 5 6} {3 4}]])
(10 20 {1 2, 5 6} {3 4})
How do you want this to work?
Right; that's the hard part about choosing the right criterion, I think.
Perhaps, using the types of collections.abc
, we could flatten all Iterable
s that aren't Mapping
s, ByteString
s, or strings.
Clojure also keeps sets in one piece.
user=> (flatten [10 #{1 2 3}])
(10 #{1 3 2})
I don't think "iterable" is the right criterion. The appropriate abc is probably Sequence
, which is iterable with a specific ordering.
Except strings or bytestrings.
Is an OrderedDict
a sequence?
>>> isinstance(OrderedDict(), abc.Sequence)
False
Python says no.