clumper
clumper copied to clipboard
Helper method to nest per dictionary
Let's say that I have the monopoly dataset. I have rows such as;
{'name': 'Boardwalk',
'rent': '50',
'house_1': '200',
'house_2': '600',
'house_3': '1400',
'house_4': '1700',
'hotel': '2000',
'deed_cost': '400',
'house_cost': '200',
'color': 'blue',
'tile': '39'}
Let's suppose that I want to change that to;
{'name': 'Boardwalk',
'color': 'blue',
'tile': '39',
'costs': {'deed': '400', 'house': '200'},
'income': {'rent': '50',
'hotel': '2000',
'house_1': '200',
'house_2': '600',
'house_3': '1400',
'house_4': '1700'}
Then you currently need to run this:
(Clumper.read_csv("tests/data/monopoly.csv")
.mutate(costs=lambda d: {"deed": d["deed_cost"], "house": d["house_cost"]},
income=lambda d: {**{"rent": d["rent"], "hotel": d["hotel"]}, **{f"house_{i}": d[f"house_{i}"] for i in [1, 2, 3, 4]}})
.drop("house_1", "house_2", "house_3", "house_4", "rent", "hotel", "deed_cost", "house_cost")
.collect())
It feels like there should be an easier way to do it. This issue is a place where we might discuss this. Since it is a rowwise operation we might come up with a helper function for mutate but since we also want to drop the values afterwards we might be able to come up with something more general.
Maybe something like:
# If you want to gather the data into a dictionary.
clump.mutate(costs=gather(d, "deed_cost", "house_cost"))
Am wondering if we can also come up with a nice inverse. Also, you still need a drop call after this...
Maybe this as an inverse?
clump.mutate(spread("costs", suffix="", prefix=""))
Mhm ... I suppose we might be able to use spread
/gather
for this as verbs. The issue is that if we want it to automatically drop values then it may be a must. Another option is to offer a nice helper function that we make available such that users can use it via pipe
. This might just keep the library a lot simpler by keeping the verbs at bay.