flatten icon indicating copy to clipboard operation
flatten copied to clipboard

Limit processing in `flatten`

Open amirziai opened this issue 8 years ago • 6 comments

  • Abandon flattening past a certain depth.
  • Abandon flattening past an element index number across all lists. This is useful for very large lists. For example we can set max_list_elements=100 and only the top 100 elements in all lists are processed.

amirziai avatar Mar 13 '17 02:03 amirziai

As mentioned in https://github.com/amirziai/flatten/issues/11, including a subset of fields or excluding fields would also limit processing (ie. 1 and 2 in https://github.com/amirziai/flatten/issues/9). As a user of the library, my use case is I have in mind that I want to process certain fields and ignore others. To me it is more intuitive to be able to specify that to the library than post processing the output of the library.

mcarans avatar Mar 13 '17 08:03 mcarans

@mcarans let's say we have this dictionary (feel free to come up with a better example but I want us to nail down the requirements):

d = {
'a': [1,2,3],
'b': {'a': [1,2,3], 'b': [4,5,6]},
'c': 'c',
'd': {'a': 5}
}

What would you want to pass to flatten and what do you want to see in return?

amirziai avatar Mar 13 '17 08:03 amirziai

I would want to pass in some way 'a' and get back from flatten 'b', 'c' and 'd' flattened. The 'a' : [1,2,3] would not be output at all. The other 'a' under 'b' and 'd' would be output ie. we only apply this at the root level.

mcarans avatar Mar 13 '17 09:03 mcarans

If it's root-level and the argument you pass is the set to be filtered I think I've addressed it in #11 . Here's the documentation I wrote: https://github.com/amirziai/flatten/tree/flatten-ignore-keys%239#ignore-root-keys

Am I missing something?

amirziai avatar Mar 13 '17 09:03 amirziai

@amirziai I see that you have implemented the above which is what I would have called 1/2 from my list ie. inclusion/exclusion of fields. For 3 and 4, I meant that all fields would be included, but that these options would enable or disable flattening of those fields ie. with your example above if we disabled flattening for 'b', then our output would have a column entitled 'b' with contents {'a': [1,2,3], 'b': [4,5,6]} (ie. it is not flattened but it is outputted).

mcarans avatar Mar 13 '17 10:03 mcarans

@mcarans I think I understand now. The point of flatten to me is that the resulting object has no structure (no iterables other than strings are present as values of the flat dictionary). This way you can easily pass the dictionary to Pandas. I see the use case though but I think that it should live outside of flatten. Feel free to open a new issue for it.

amirziai avatar Mar 13 '17 10:03 amirziai