convtools
convtools copied to clipboard
convtools is a specialized Python library for dynamic, declarative data transformations with automatic code generation
convtools
convtools is a specialized Python library designed for defining data transformations dynamically using a declarative approach. It automatically generates custom Python code for the user in the background.
Installation
pip install convtools
Documentation
Group by example
from convtools import conversion as c
input_data = [
{"a": 5, "b": "foo"},
{"a": 10, "b": "foo"},
{"a": 10, "b": "bar"},
{"a": 10, "b": "bar"},
{"a": 20, "b": "bar"},
]
conv = (
c.group_by(c.item("b"))
.aggregate(
{
"b": c.item("b"),
"a_first": c.ReduceFuncs.First(c.item("a")),
"a_max": c.ReduceFuncs.Max(c.item("a")),
}
)
.pipe(
c.aggregate({
"b_values": c.ReduceFuncs.Array(c.item("b")),
"mode_a_first": c.ReduceFuncs.Mode(c.item("a_first")),
"median_a_max": c.ReduceFuncs.Median(c.item("a_max")),
})
)
.gen_converter()
)
assert conv(input_data) == {
'b_values': ['foo', 'bar'],
'mode_a_first': 10,
'median_a_max': 15.0
}
Built-in reducers like c.ReduceFuncs.First
* Sum
* SumOrNone
* Max
* MaxRow
* Min
* MinRow
* Count
* CountDistinct
* First
* Last
* Average
* Median
* Percentile
* Mode
* TopK
* Array
* ArrayDistinct
* ArraySorted
DICT REDUCERS ARE IN FACT AGGREGATIONS THEMSELVES, BECAUSE VALUES GET REDUCED.
* Dict
* DictArray
* DictSum
* DictSumOrNone
* DictMax
* DictMin
* DictCount
* DictCountDistinct
* DictFirst
* DictLast
AND LASTLY YOU CAN DEFINE YOUR OWN REDUCER BY PASSING ANY REDUCE FUNCTION
OF TWO ARGUMENTS TO ``c.reduce``.
What's the point if there are tools like Pandas / Polars?
- convtools doesn't need to wrap data in a container to provide functionality, it simply runs the python code it generates on any input
- convtools is lightweight (though optional
black
is highly recommended for pretty-printing generated code out of curiosity) - convtools fosters building pipelines on top of iterators, allowing for stream processing
- convtools supports nested aggregations
- convtools is a set of primitives for code generation, so it's just different.
Support
- westandskif (Nikita Almakov) - Link to support
Reporting a Security Vulnerability
See the security policy.