framework
framework copied to clipboard
--skip-errors doesn't work for packages
Overview
Edit: In my case, I only tried foreign key checks, but as @fjuniorr noted below, --skip-errors
appears to be broken for all errors when checking a package.
When validating a package using the CLI, --skip-errors
does not appear to disable foreign key checks. Validation passes if and only if the foreign keys are commented out in each table schema file.
I'm running the following command:
frictionless validate --trusted --limit-errors 50 --skip-errors [see below] $OUTPUT_DIR/package.json
For the error slug, I've tried:
-
foreign-key
(from docs) -
foreign-key-error
(mentioned here) -
foreignKey
(from source code) -
foreignKeyError
(by analogy withforeign-key-error
)
I've also tried all four at the same time, separated by commas with no intervening spaces.
Sample output:
dataset
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ medical_patient │ table │ /var/data-pkg/output/Patient_20240112150044.csv │ VALID │
│ medical_encounter │ table │ /var/data-pkg/output/Encounter_20240112150042.csv │ VALID │
│ medical_medications │ table │ /var/data-pkg/output/Medications_20240112150809.csv │ VALID │
│ medical_problem │ table │ /var/data-pkg/output/Problem_20240112150453.csv │ VALID │
│ medical_toxicity │ table │ /var/data-pkg/output/Toxicity_20240112151505.csv │ INVALID │
│ medical_observations │ table │ /var/data-pkg/output/Observations_20240112155005.csv │ VALID │
│ medical_vitals │ table │ /var/data-pkg/output/Vitals_20240112150819.csv │ VALID │
└──────────────────────┴───────┴───────────────────────────────────────────────────────┴─────────┘
medical_toxicity
┏━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row ┃ Field ┃ Type ┃ Message ┃
┡━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2 │ None │ foreign-key │ Row at position "2" violates the foreign key: for "EMPI": values "......" not found in the lookup table "medical_patient" as "EMPI" │
│ 2 │ None │ foreign-key │ Row at position "2" violates the foreign key: for "MRN": values "......" not found in the lookup table "medical_patient" as "MRN" │
│ 2 │ None │ foreign-key │ Row at position "2" violates the foreign key: for "EncounterNumber": values "......" not found in the lookup table "medical_encounter" as "EncounterNumber" │
│ 3 │ None │ foreign-key │ Row at position "3" violates the foreign key: for "EMPI": values "......" not found in the lookup table "medical_patient" as "EMPI" │
...etc
Info
Environment:
App is running inside the official Python 3.12.1 Alpine Linux Docker image.
The requirements.txt file, in its entirety:
chardet==5.2.0 # Character encoding detection
click==8.1.7 # CLI
frictionless==5.16.1 # Validation
pandas==2.2.0 # CSV loading and cleaning
pyyaml==6.0.1 # Configuration file loading
Package
The package consists of a few unremarkable CSVs:
- All leading and trailing whitespace is stripped from each field, so we know that's not the issue.
- All column names are valid Python identifiers.
Package JSON, presented as YAML for readability:
resources:
- encoding: utf-8
format: csv
mediatype: text/csv
name: medical_patient
path: /var/data-pkg/output/Patient_20240112150044.csv
schema: /app/schemas/medical/patient.yaml
type: table
- encoding: utf-8
format: csv
mediatype: text/csv
name: medical_encounter
path: /var/data-pkg/output/Encounter_20240112150042.csv
schema: /app/schemas/medical/encounter.yaml
type: table
- encoding: utf-8
format: csv
mediatype: text/csv
name: medical_medications
path: /var/data-pkg/output/Medications_20240112150809.csv
schema: /app/schemas/medical/medications.yaml
type: table
- encoding: utf-8
format: csv
mediatype: text/csv
name: medical_problem
path: /var/data-pkg/output/Problem_20240112150453.csv
schema: /app/schemas/medical/problem.yaml
type: table
- encoding: utf-8
format: csv
mediatype: text/csv
name: medical_toxicity
path: /var/data-pkg/output/Toxicity_20240112151505.csv
schema: /app/schemas/medical/toxicity.yaml
type: table
- encoding: utf-8
format: csv
mediatype: text/csv
name: medical_observations
path: /var/data-pkg/output/Observations_20240112155005.csv
schema: /app/schemas/medical/observations.yaml
type: table
- encoding: utf-8
format: csv
mediatype: text/csv
name: medical_vitals
path: /var/data-pkg/output/Vitals_20240112150819.csv
schema: /app/schemas/medical/vitals.yaml
type: table
It looks like this is a more general error that we can't skip any error in the CLI for validating packages. In frictionless 5.17.0 with this reprex I get:
frictionless validate --skip-errors "blank-label" https://raw.githubusercontent.com/splor-mg/reprex/main/reprex/20231228T143527/datapackage.json
────────────────────────────────────────────────────────────── Dataset ───────────────────────────────────────────────────────────────
dataset
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ data │ table │ data.csv │ INVALID │
└──────┴───────┴──────────┴─────────┘
─────────────────────────────────────────────────────────────── Tables ───────────────────────────────────────────────────────────────
data
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row ┃ Field ┃ Type ┃ Message ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ None │ 2 │ blank-label │ Label in the header in field at position "2" is blank │
└──────┴───────┴─────────────┴───────────────────────────────────────────────────────┘
When I validate the data file (or a standalone resource) the check is properly skipped:
frictionless validate --skip-errors "blank-label" https://raw.githubusercontent.com/splor-mg/reprex/main/reprex/20231228T143527/data.csv
─────────────────────────────────────────────────────── Dataset ────────────────────────────────────────────────────────
dataset
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━┩
│ data │ table │ data.csv │ VALID │
└──────┴───────┴──────────┴────────┘
Good catch. I'll change the title of the ticket to reflect this.