framework icon indicating copy to clipboard operation
framework copied to clipboard

Unexpected KeyError raised related to missing 'primary_key' field schema in the tabular data columns

Open amelie-rondot opened this issue 1 year ago • 0 comments

Overview

In the of migration from v4 to v5 of frictionless-py in validata.fr, we experienced an unexpected KeyError when validating a tabular data which not contains a primary_key header contained in the schema used for validate data.

For example :

data = [["b"], ["foo"]]
schema = {
        "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
        "fields": [
            {
                "name": "a",
            }
        ],
        "primaryKey": ["a"],
    }

Using python, it raises a KeyError:

import frictionless

if __name__ == "__main__":
    report = frictionless.validate(
            source=data,
            schema=frictionless.Schema.from_descriptor(schema),
            detector=frictionless.Detector(schema_sync=True))
    print(report)

Output:

Traceback (most recent call last):
  File "code.py", line 4, in <module>
    report = frictionless.validate(
    ...
    File "frictionless/table/row.py", line 281, in __process
        raise KeyError(f"Row does not have a field {key}")
    KeyError: 'Row does not have a field a'

Expected behaviour

According to the documentation of PrimaryKey specification of TableSchema, the fields related to the primary-key in the data cannot be null. I was expected an invalid report specifying a missing-primary-key for example, mentioning the error description "Based on the schema there should be a label 'a' corresponding to the schema's primary key that is missing in the data's header." for example. (Such as for missing-label validation error.) The expected report validation would be:

{'valid': False,
 'stats': {'tasks': 1, 'errors': 1, 'warnings': 0, 'seconds': 0.003},
 'warnings': [],
 'errors': [],
 'tasks': [{'name': 'memory',
            'type': 'table',
            'valid': False,
            'place': '<memory>',
            'labels': ['b'],
            'stats': {'errors': 1,
                      'warnings': 0,
                      'seconds': 0.003,
                      'fields': 2,
                      'rows': 1},
            'warnings': [],
            'errors': [{'type': 'missing-primary-key',
                        'title': 'Missing Primary Key',
                        'description': 'Based on the schema there should be a '
                                       "label 'a' corresponding to the schema's primary key"
                                        'that is missing in the data's header.',
                        'message': "There is a missing primary key in the header's "
                                   'field "a" at position "2"',
                        'tags': ['#table', '#header', '#label'],
                        'note': '',
                        'labels': ['b'],
                        'rowNumbers': [1],
                        'label': '',
                        'fieldName': 'a',
                        'fieldNumber': 2}]}]}

Other details and experimentations

Frictionless version 5.16.0

Same result with command line validation. I have put "schema-sync" to reproduce more closely our use case, but it does not seem to be related with the actual issue.

amelie-rondot avatar Jan 31 '24 12:01 amelie-rondot