roll issues

Results 345 issues of


                                            roll

Improve foreign keys validation

# Overview Currently, in v5 we do normalizationfrom standards@1 in the `Schema.from_descriptor` step and lose some validation errors because of it e.g. missing `foreignKeys.reference.resource`. See this failing test: ``` def...

general

Recover steps.table_merge

# Overview See failing tests in the corresponding test file

bug

help wanted

Recover steps.table_pivot

# Overview See this failing test: ``` def test_step_table_pivot(): source = Resource("data/transform-pivot.csv") pipeline = Pipeline( steps=[ steps.table_normalize(), steps.table_pivot(f1="region", f2="gender", f3="units", aggfun=sum), ], ) target = source.transform(pipeline) assert target.schema.to_descriptor() == {...

bug

help wanted

Support WHERE/PREDICATE for suitable steps

# Overview There are many PETL transforms that accepts WHERE argument, for example, for updating field values. Eventually, we'd like to support this functionality too.

feature

Compressed multipart resource doesn't work

# Overview See this failing test: ``` def test_multipart_loader_with_compressed_parts(): with Resource( path="data/chunk1.csv.zip", extrapaths=["data/chunk2.csv.zip"] ) as resource: assert resource.innerpath == "" assert resource.compression == "" assert resource.header == ["id", "name"] assert...

bug

help wanted

Possible improvements to the parquet format implementation

# Overview From @ewheeler One thing to note-- fastparquet and pyarrow libraries have some parquet handling differences with pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-parquet https://github.com/pandas-dev/pandas/issues/42968#issuecomment-965318185 In the words of a pandas contributor: "Summary: it's...

enhancement

Show progress during CLI validation

# Overview We need to explore possibilities but many CLI tools do it. We can shoe row count / etc

feature

Integration with DataONE as a data portal

# Overview https://search.dataone.org/data/mode=list

feature

Integration with data.world as a data portal

# Overview ``` # Read from datapackage file (we need to create/find a dataset with datapackage.json) package = Package("https://data.world/14thlevelcleric/caseys-money") # Create package from present files package = Package("https://data.world/14thlevelcleric/caseys-money") # Write...

feature

Integration with OSF as a data portal

# Overview ``` # Read from datapackage file (we need to create/find a dataset with datapackage.json) package = Package("https://osf.io/tge9m/") # Create package from present files package = Package("https://osf.io/tge9m/") # More...

feature