framework
framework copied to clipboard
Fix 1633 unexpected key error with primary key in schema and 1635 unexpected missing label error when ignoring header case
- fixes #1633
- fixes #1635
This PR fixes unexpected missing-label
error in validation report which occurs when validating tabular data in this specific case (#1635): The data contains a named column corresponding to a required field in the schema used for validation, written without respecting case-sensitivity.
For example:
data = [["aa", "BB"], ["a", "b"]]
schema = {
"$schema": "https://frictionlessdata.io/schemas/table-schema.json",
"fields": [
{"name": "AA", "constraints": {"required": True}},
{"name": "bb", "constraints": {"required": True}}
]
}
It adds some test cases:
- one test case to ensure that validating this tabular data with this schema and using
schema_sync=True
anddialect=Dialect(header_case=False)
options, returns a valid report. - one test case to ensure that if validation is case-sensitive, if a name column does not respect case a missing-label occurs in validation report.
- another test case to ensure that validating with missing required column "AA" with this schema and using
schema_sync=True
anddialect=Dialect(header_case=False)
options, returns an invalid report withmissing-label
error related to field "AA".
This PR also fixes unexpected KeyError raised when a primary key used in shema is missing in the tabular data to validate(#1633). For example:
data = [["B"], ["b"]]
schema = {
"$schema": "https://frictionlessdata.io/schemas/table-schema.json",
"fields": [
{"name": "A"},
{"name": "B"}
],
"primaryKey": ["A"]
}
It adds some test cases:
- two test cases to ensure that when a label related to the schema primary key is missing, the validation report is invalid with single missing-label error:
- a test case with the schema containing the primary key field not specified as 'required'
- a test case with the schema containing the primary key field specified as 'required'
- a test case to deal with insensitivity case in the data label corresponding to the primary key schema field: the validation report is valid
Finally, I suggest some refactoring in this PR:
- refactoring removing missing required label from field info refactoring removing missing required label from field info
- refactoring creating schema fields among labels part when using
schema_sync
option indetect_schema()
Detector
method - refactoring to introduce intermediary
TableResource
methods to createcells
related to primary key schema.
Hi @roll,
This PR is ready for your review. Some things somewhat related to this PR, detailed below bother me a little bit. However, the bugs that are fixed by this PR are blocking for our migration from v4 to v5 on Validata. Would you mind reviewing it as is, and may I transform these concerns into issues and propose associated pull requests?
Please don't hesitate if something is bothering you, if I am wrong in any of my points or missed the obvious. Thanks.
Details :
-
supporting "header_case" is currently a headache as you have to check each time you use a label or schema field or primary key that you covered the case with "header_case = False". I would imagine that if "header_case" is ignored (False), then you could store all data in lowercase once and for all and not bother anymore.
-
Due to the way schema sync is implemented, you have to first include the missing required fields (in order to have the related errors), but then remove them for the row_stream to be correct. This feels overly complex, but I have not yet a clear alternative in mind.
-
A primary key is implicitly required. However,
field.required
is not "True" for those fields if they are not explicitly set in the schema (if I judge by this line). I would suggest (without having looked much at the code) that "required" should be a@property
that returns "True" for a primary key.
Hi @pierrecamilleri,
Thanks a lot! Can you please resolve the conflicts as the dev/release process update just been merged https://github.com/frictionlessdata/frictionless-py/pull/1652
Hi @pierrecamilleri,
Thanks a lot! Can you please resolve the conflicts as the dev/release process update just been merged #1652
Done
EDIT: Oops tests no longer pass, I spoke too fast. I look into it
All set !
@pierrecamilleri we shoudn't be using pytest.lazy_fixture
in our codebase since it has been deprecated: https://github.com/TvoroG/pytest-lazy-fixture/issues/16
Could we instead call from pytest_lazyfixture import lazy_fixture
as suggested in https://github.com/TvoroG/pytest-lazy-fixture/issues/22#issuecomment-361976842.
It is not part of your PR but would be nice to fix it.
@pierrecamilleri wait, I will fix this in a new PR instead and then we can merge the fix into this two PRs.
@pierrecamilleri I just merged #1674 . Could you please rebase and then merge when test are passing?