framework
framework copied to clipboard
Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
# Overview Requested by community as a follow-up of the github integration
# Overview An important step for Frictionless Framework is to provide an ability to read and write packages from different data portals (CKAN/Github/Zenodod/etc) so the users can publish and access...
# Overview Currently, in v5 we do normalizationfrom standards@1 in the `Schema.from_descriptor` step and lose some validation errors because of it e.g. missing `foreignKeys.reference.resource`. See this failing test: ``` def...
# Overview See failing tests in the corresponding test file
# Overview See this failing test: ``` def test_step_table_pivot(): source = Resource("data/transform-pivot.csv") pipeline = Pipeline( steps=[ steps.table_normalize(), steps.table_pivot(f1="region", f2="gender", f3="units", aggfun=sum), ], ) target = source.transform(pipeline) assert target.schema.to_descriptor() == {...
# Overview There are many PETL transforms that accepts WHERE argument, for example, for updating field values. Eventually, we'd like to support this functionality too.
# Overview When using `format: ckan` the name of the resource/package is directly copied from the name the portal uses. This leads to violations in name validation. Example: ```yaml #...
# Overview See this failing test: ``` def test_multipart_loader_with_compressed_parts(): with Resource( path="data/chunk1.csv.zip", extrapaths=["data/chunk2.csv.zip"] ) as resource: assert resource.innerpath == "" assert resource.compression == "" assert resource.header == ["id", "name"] assert...
I was recommended to post this as a possible feature request by @aivuk at https://frictionlessdata.slack.com/archives/C0362US1U3G/p1658213511287809?thread_ts=1658148616.050559&cid=C0362US1U3G My original question was > I have an Excel file and is there an easy...
# Overview From @ewheeler One thing to note-- fastparquet and pyarrow libraries have some parquet handling differences with pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-parquet https://github.com/pandas-dev/pandas/issues/42968#issuecomment-965318185 In the words of a pandas contributor: "Summary: it's...