roll
roll
# Overview Currently, in v5 we do normalizationfrom standards@1 in the `Schema.from_descriptor` step and lose some validation errors because of it e.g. missing `foreignKeys.reference.resource`. See this failing test: ``` def...
# Overview See failing tests in the corresponding test file
# Overview See this failing test: ``` def test_step_table_pivot(): source = Resource("data/transform-pivot.csv") pipeline = Pipeline( steps=[ steps.table_normalize(), steps.table_pivot(f1="region", f2="gender", f3="units", aggfun=sum), ], ) target = source.transform(pipeline) assert target.schema.to_descriptor() == {...
# Overview There are many PETL transforms that accepts WHERE argument, for example, for updating field values. Eventually, we'd like to support this functionality too.
# Overview See this failing test: ``` def test_multipart_loader_with_compressed_parts(): with Resource( path="data/chunk1.csv.zip", extrapaths=["data/chunk2.csv.zip"] ) as resource: assert resource.innerpath == "" assert resource.compression == "" assert resource.header == ["id", "name"] assert...
# Overview From @ewheeler One thing to note-- fastparquet and pyarrow libraries have some parquet handling differences with pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-parquet https://github.com/pandas-dev/pandas/issues/42968#issuecomment-965318185 In the words of a pandas contributor: "Summary: it's...
# Overview We need to explore possibilities but many CLI tools do it. We can shoe row count / etc
# Overview https://search.dataone.org/data/mode=list
# Overview ``` # Read from datapackage file (we need to create/find a dataset with datapackage.json) package = Package("https://data.world/14thlevelcleric/caseys-money") # Create package from present files package = Package("https://data.world/14thlevelcleric/caseys-money") # Write...
# Overview ``` # Read from datapackage file (we need to create/find a dataset with datapackage.json) package = Package("https://osf.io/tge9m/") # Create package from present files package = Package("https://osf.io/tge9m/") # More...