framework
framework copied to clipboard
Unexpected KeyError raised related to missing 'primary_key' field schema in the tabular data columns
Overview
In the of migration from v4 to v5 of frictionless-py
in validata.fr, we experienced an unexpected KeyError
when validating a tabular data which not contains a primary_key
header contained in the schema used for validate data.
For example :
data = [["b"], ["foo"]]
schema = {
"$schema": "https://frictionlessdata.io/schemas/table-schema.json",
"fields": [
{
"name": "a",
}
],
"primaryKey": ["a"],
}
Using python, it raises a KeyError:
import frictionless
if __name__ == "__main__":
report = frictionless.validate(
source=data,
schema=frictionless.Schema.from_descriptor(schema),
detector=frictionless.Detector(schema_sync=True))
print(report)
Output:
Traceback (most recent call last):
File "code.py", line 4, in <module>
report = frictionless.validate(
...
File "frictionless/table/row.py", line 281, in __process
raise KeyError(f"Row does not have a field {key}")
KeyError: 'Row does not have a field a'
Expected behaviour
According to the documentation of PrimaryKey
specification of TableSchema
, the fields related to the primary-key
in the data cannot be null
.
I was expected an invalid report specifying a missing-primary-key
for example, mentioning the error description "Based on the schema there should be a label 'a' corresponding to the schema's primary key that is missing in the data's header." for example. (Such as for missing-label
validation error.)
The expected report validation would be:
{'valid': False,
'stats': {'tasks': 1, 'errors': 1, 'warnings': 0, 'seconds': 0.003},
'warnings': [],
'errors': [],
'tasks': [{'name': 'memory',
'type': 'table',
'valid': False,
'place': '<memory>',
'labels': ['b'],
'stats': {'errors': 1,
'warnings': 0,
'seconds': 0.003,
'fields': 2,
'rows': 1},
'warnings': [],
'errors': [{'type': 'missing-primary-key',
'title': 'Missing Primary Key',
'description': 'Based on the schema there should be a '
"label 'a' corresponding to the schema's primary key"
'that is missing in the data's header.',
'message': "There is a missing primary key in the header's "
'field "a" at position "2"',
'tags': ['#table', '#header', '#label'],
'note': '',
'labels': ['b'],
'rowNumbers': [1],
'label': '',
'fieldName': 'a',
'fieldNumber': 2}]}]}
Other details and experimentations
Frictionless version 5.16.0
Same result with command line validation. I have put "schema-sync" to reproduce more closely our use case, but it does not seem to be related with the actual issue.