framework
framework copied to clipboard
The data resource has an error: properties "path" and "data" is mutually exclusive when calling resource.to_view()
Overview
I want to be able to change both metadata and the data in a transform pipeline and export both data and metadata (I think as in https://github.com/frictionlessdata/frictionless-py/issues/1062). An example might be:
from frictionless import Resource, steps, Pipeline
pipeline = Pipeline(steps=[
steps.field_update(name='id', descriptor = {'name': 'pkey', 'title': 'Primary Key'}),
steps.row_filter(formula='name != "france"'),
steps.table_write(path='output.csv'),
steps.resource_update(name='data', descriptor={'path': 'output.csv'}),
])
resource = Resource(path='https://raw.githubusercontent.com/frictionlessdata/frictionless-py/d6f2552b4fd950f459130eda9cf80ae0b8b4931e/data/transform.csv')
resource.transform(pipeline)
resource.to_yaml('resource.yaml')
print(f'{resource=}')
print(f'{resource.read_rows()=}')
which gives me what I want:
resource={'name': 'transform',
'type': 'table',
'path': 'output.csv',
'data': [],
'scheme': '',
'format': 'inline',
'mediatype': 'text/csv',
'extrapaths': [],
'schema': {'fields': [{'name': 'pkey',
'type': 'integer',
'title': 'Primary Key'},
{'name': 'name', 'type': 'string'},
{'name': 'population', 'type': 'integer'}]}}
resource.read_rows()=[{'pkey': 1, 'name': 'germany', 'population': 83}, {'pkey': 3, 'name': 'spain', 'population': 47}]
However if I run resource.to_view()
I get
File "/Users/fjunior/Projects/splor/datapackage-reprex/reprex/20230721T164121/venv/lib/python3.11/site-packages/frictionless/metadata.py", line 177, in from_descriptor
raise FrictionlessException(error, reasons=errors)
frictionless.exception.FrictionlessException: [resource-error] The data resource has an error: descriptor is not valid (The data resource has an error: properties "path" and "data" is mutually exclusive)
Trying to set data
to None
in the pipeline
pipeline = Pipeline(steps=[
steps.field_update(name='id', descriptor = {'name': 'pkey', 'title': 'Primary Key'}),
steps.row_filter(formula='name != "france"'),
steps.table_write(path='output.csv'),
steps.resource_update(name='data', descriptor={'path': 'output.csv', 'data': None}),
])
also don't help because I get
File "/Users/fjunior/Projects/splor/datapackage-reprex/reprex/20230721T164121/venv/lib/python3.11/site-packages/frictionless/transformer/transformer.py", line 92, in __iter__
raise FrictionlessException(error) from exception
frictionless.exception.FrictionlessException: [step-error] Step is not valid: "resource_update" raises "'NoneType' object is not iterable"
In general is mixing path
and data
a bad idea during a pipeline transformation? Is there other alternative to deal with the use case of changing and exporting both data and metadata?
Another possibly related behaviour is that trying to infer stats gives correct values for fields
and rows
, but not for hash
which gets the value 'sha256:None'
when in theory data should be coming from path
(which is set to 'output.csv'
) and not the in-memory data
.
from frictionless import Resource, steps, Pipeline
pipeline = Pipeline(steps=[
steps.field_update(name='id', descriptor = {'name': 'pkey', 'title': 'Primary Key'}),
steps.row_filter(formula='name != "france"'),
steps.table_write(path='output.csv'),
steps.resource_update(name='data', descriptor={'path': 'output.csv'}),
])
resource = Resource(path='https://raw.githubusercontent.com/frictionlessdata/frictionless-py/d6f2552b4fd950f459130eda9cf80ae0b8b4931e/data/transform.csv')
resource.transform(pipeline)
resource.infer(stats=True)
print(f'{resource=}')
Thanks @fjuniorr for reporting. We will investigate it.