framework
framework copied to clipboard
Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
# Overview We have some functions that require collecting data in memory like: - `checks.duplicate_row` - `checks.deviated_cell/value` - `resource.analyze` - etc We might provide an internal cache system (switching to...
# Overview Parallelization can be added to some steps/etc
# Overview "table-aggregate" step when used with len doesn't work. ``` source = Resource(path="784/transform.csv") target = transform( source, steps=[ steps.table_normalize(), steps.table_aggregate( group_name="name", aggregation={"min": ("population", len)} ), ], ) print(target.schema) print(target.to_view())...
# Overview We need an ability to save metadata + data (package + all resources)
# Overview As a part of v6's transform work. Probably we need to make it immutable (proxy for cells) for performance
# Overview At the moment, it doesn't match. Shall we normalize line endings etc? It's complicated because `python.csv` requires opening files without a universal newline. On the other hand, the...
# Overview @pwalsh has wrote > sleep: > > it is a killer if you can't force a sleep between runs. This was a crude way to work around API...
# Overview The migration from `tabulator/tableschema/datapackage/goodtables` gave good speed improvement but we still can make it faster especially for working with numbers - https://github.com/frictionlessdata/frictionless-py/issues/461 # Tasks - [ ] create...