flatten
flatten copied to clipboard
Limit processing in `flatten`
- Abandon flattening past a certain depth.
- Abandon flattening past an element index number across all lists. This is useful for very large lists. For example we can set
max_list_elements=100and only the top 100 elements in all lists are processed.
As mentioned in https://github.com/amirziai/flatten/issues/11, including a subset of fields or excluding fields would also limit processing (ie. 1 and 2 in https://github.com/amirziai/flatten/issues/9). As a user of the library, my use case is I have in mind that I want to process certain fields and ignore others. To me it is more intuitive to be able to specify that to the library than post processing the output of the library.
@mcarans let's say we have this dictionary (feel free to come up with a better example but I want us to nail down the requirements):
d = {
'a': [1,2,3],
'b': {'a': [1,2,3], 'b': [4,5,6]},
'c': 'c',
'd': {'a': 5}
}
What would you want to pass to flatten and what do you want to see in return?
I would want to pass in some way 'a' and get back from flatten 'b', 'c' and 'd' flattened. The 'a' : [1,2,3] would not be output at all. The other 'a' under 'b' and 'd' would be output ie. we only apply this at the root level.
If it's root-level and the argument you pass is the set to be filtered I think I've addressed it in #11 . Here's the documentation I wrote: https://github.com/amirziai/flatten/tree/flatten-ignore-keys%239#ignore-root-keys
Am I missing something?
@amirziai I see that you have implemented the above which is what I would have called 1/2 from my list ie. inclusion/exclusion of fields. For 3 and 4, I meant that all fields would be included, but that these options would enable or disable flattening of those fields ie. with your example above if we disabled flattening for 'b', then our output would have a column entitled 'b' with contents {'a': [1,2,3], 'b': [4,5,6]} (ie. it is not flattened but it is outputted).
@mcarans I think I understand now. The point of flatten to me is that the resulting object has no structure (no iterables other than strings are present as values of the flat dictionary). This way you can easily pass the dictionary to Pandas. I see the use case though but I think that it should live outside of flatten. Feel free to open a new issue for it.