hyrule icon indicating copy to clipboard operation
hyrule copied to clipboard

Flattening list of lists of dicts leads to surprising results

Open tuturto opened this issue 7 years ago • 11 comments

Instead of

=> (flatten [[{:one 1 :two 2}] [{:three 3 :four 4}]])
['\ufdd0:one', '\ufdd0:two', '\ufdd0:three', '\ufdd0:four']

I was expecting

=> (flatten [[{:one 1 :two 2}] [{:three 3 :four 4}]])
[{:one 1 :two 2} {:three 3 :four 4}]

tuturto avatar Oct 19 '17 10:10 tuturto

Currently, flatten uses coll? to decide whether to flatten something. This example constitutes a good argument that coll? isn't the right criterion. But it's not clear what we should use instead. Checking for the existence of __getitem__ is perhaps the most obvious choice, but this wouldn't flatten sets and frozensets, which we probably do want to flatten.

Kodiologist avatar Oct 19 '17 14:10 Kodiologist

By the way, if you just want to concatenate one level of iterables in x, without recursive flattening, you can use (reduce + x) or (sum x []).

Kodiologist avatar Oct 19 '17 16:10 Kodiologist

To clarify, @tuturto, did you expect [{:one 1 :two 2} {:three 3 :four 4}] because you forgot that flatten was recursive, or because you thought it would leave dictionaries in particular untouched?

Kodiologist avatar Oct 19 '17 16:10 Kodiologist

The latter one, I expected dictionaries be left as they were. And even if dictionaries were flattened, we shouldn't be discarding half of the information (keeping only keys and discarding values) while doing so.

tuturto avatar Oct 19 '17 17:10 tuturto

Don't use sum like that.

This function is intended specifically for use with numeric values and may reject non-numeric types.

It's a mere implementation detail that this works at all. A type checker has every right to flag that use, and other Python implementations (even later CPython implementations) might not support it.

gilch avatar Oct 19 '17 17:10 gilch

Yes, it's allegedly an implementation detail, but it's an obvious desideratum. I doubt it will ever be removed.

Kodiologist avatar Oct 19 '17 17:10 Kodiologist

In Python, dicts are iterables. Their iterator returns the keys. In a for loop, this is more useful, since you can still use the keys to get the values via lookup. If you want the pairs, you have to use .items.

Clojure arguably handles this more consistently.

user=> (seq {1 2, 3 4})
([1 2] [3 4])

The default iterator is like Python's .items.

So how does Clojure flatten maps?

user=> (flatten {1 2, 3 4})
()
user=> (doc flatten)
-------------------------
clojure.core/flatten
([x])
  Takes any nested combination of sequential things (lists, vectors,
  etc.) and returns their contents as a single, flat sequence.
  (flatten nil) returns an empty sequence.
nil
user=> (flatten (seq {1 2, 3 4}))
(1 2 3 4)

It doesn't.

Flatten should support third-party iterables too. How can you treat them consistently if you only make dicts a special case? This is how Python is. How do you want this to work?

gilch avatar Oct 19 '17 17:10 gilch

It looks like Clojure keeps maps in one piece when they're not the top level.

user=> (flatten [10 20 [{1 2, 5 6} {3 4}]])
(10 20 {1 2, 5 6} {3 4})

gilch avatar Oct 19 '17 17:10 gilch

How do you want this to work?

Right; that's the hard part about choosing the right criterion, I think.

Perhaps, using the types of collections.abc, we could flatten all Iterables that aren't Mappings, ByteStrings, or strings.

Kodiologist avatar Oct 19 '17 17:10 Kodiologist

Clojure also keeps sets in one piece.

user=> (flatten [10 #{1 2 3}])
(10 #{1 3 2})

I don't think "iterable" is the right criterion. The appropriate abc is probably Sequence, which is iterable with a specific ordering.

gilch avatar Oct 19 '17 17:10 gilch

Except strings or bytestrings.

Is an OrderedDict a sequence?

>>> isinstance(OrderedDict(), abc.Sequence)
False

Python says no.

gilch avatar Oct 19 '17 17:10 gilch