framework icon indicating copy to clipboard operation
framework copied to clipboard

--skip-errors doesn't work for packages

Open diego-oncoramedical opened this issue 1 year ago • 2 comments

Overview

Edit: In my case, I only tried foreign key checks, but as @fjuniorr noted below, --skip-errors appears to be broken for all errors when checking a package.

When validating a package using the CLI, --skip-errors does not appear to disable foreign key checks. Validation passes if and only if the foreign keys are commented out in each table schema file.

I'm running the following command:

frictionless validate --trusted --limit-errors 50 --skip-errors [see below] $OUTPUT_DIR/package.json

For the error slug, I've tried:

  • foreign-key (from docs)
  • foreign-key-error (mentioned here)
  • foreignKey (from source code)
  • foreignKeyError (by analogy with foreign-key-error)

I've also tried all four at the same time, separated by commas with no intervening spaces.

Sample output:

                                              dataset                                              
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ name                 ┃ type  ┃ path                                                  ┃ status  ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ medical_patient      │ table │ /var/data-pkg/output/Patient_20240112150044.csv       │ VALID   │
│ medical_encounter    │ table │ /var/data-pkg/output/Encounter_20240112150042.csv     │ VALID   │
│ medical_medications  │ table │ /var/data-pkg/output/Medications_20240112150809.csv   │ VALID   │
│ medical_problem      │ table │ /var/data-pkg/output/Problem_20240112150453.csv       │ VALID   │
│ medical_toxicity     │ table │ /var/data-pkg/output/Toxicity_20240112151505.csv      │ INVALID │
│ medical_observations │ table │ /var/data-pkg/output/Observations_20240112155005.csv  │ VALID   │
│ medical_vitals       │ table │ /var/data-pkg/output/Vitals_20240112150819.csv        │ VALID   │
└──────────────────────┴───────┴───────────────────────────────────────────────────────┴─────────┘

                                                                                       medical_toxicity
┏━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row ┃ Field ┃ Type        ┃ Message                                                                                                                                                     ┃
┡━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2   │ None  │ foreign-key │ Row at position "2" violates the foreign key: for "EMPI": values "......" not found in the lookup table "medical_patient" as "EMPI"                         │
│ 2   │ None  │ foreign-key │ Row at position "2" violates the foreign key: for "MRN": values "......" not found in the lookup table "medical_patient" as "MRN"                           │
│ 2   │ None  │ foreign-key │ Row at position "2" violates the foreign key: for "EncounterNumber": values "......" not found in the lookup table "medical_encounter" as "EncounterNumber" │
│ 3   │ None  │ foreign-key │ Row at position "3" violates the foreign key: for "EMPI": values "......" not found in the lookup table "medical_patient" as "EMPI"                         │

...etc

Info

Environment:

App is running inside the official Python 3.12.1 Alpine Linux Docker image.

The requirements.txt file, in its entirety:

chardet==5.2.0          # Character encoding detection
click==8.1.7            # CLI
frictionless==5.16.1    # Validation
pandas==2.2.0           # CSV loading and cleaning
pyyaml==6.0.1           # Configuration file loading

Package

The package consists of a few unremarkable CSVs:

  • All leading and trailing whitespace is stripped from each field, so we know that's not the issue.
  • All column names are valid Python identifiers.

Package JSON, presented as YAML for readability:

resources:
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_patient
  path: /var/data-pkg/output/Patient_20240112150044.csv
  schema: /app/schemas/medical/patient.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_encounter
  path: /var/data-pkg/output/Encounter_20240112150042.csv
  schema: /app/schemas/medical/encounter.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_medications
  path: /var/data-pkg/output/Medications_20240112150809.csv
  schema: /app/schemas/medical/medications.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_problem
  path: /var/data-pkg/output/Problem_20240112150453.csv
  schema: /app/schemas/medical/problem.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_toxicity
  path: /var/data-pkg/output/Toxicity_20240112151505.csv
  schema: /app/schemas/medical/toxicity.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_observations
  path: /var/data-pkg/output/Observations_20240112155005.csv
  schema: /app/schemas/medical/observations.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_vitals
  path: /var/data-pkg/output/Vitals_20240112150819.csv
  schema: /app/schemas/medical/vitals.yaml
  type: table

diego-oncoramedical avatar Feb 05 '24 18:02 diego-oncoramedical

It looks like this is a more general error that we can't skip any error in the CLI for validating packages. In frictionless 5.17.0 with this reprex I get:

frictionless validate --skip-errors "blank-label" https://raw.githubusercontent.com/splor-mg/reprex/main/reprex/20231228T143527/datapackage.json
────────────────────────────────────────────────────────────── Dataset ───────────────────────────────────────────────────────────────
               dataset               
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ type  ┃ path     ┃ status  ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ data │ table │ data.csv │ INVALID │
└──────┴───────┴──────────┴─────────┘
─────────────────────────────────────────────────────────────── Tables ───────────────────────────────────────────────────────────────
                                         data                                         
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row  ┃ Field ┃ Type        ┃ Message                                               ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ None │ 2     │ blank-label │ Label in the header in field at position "2" is blank │
└──────┴───────┴─────────────┴───────────────────────────────────────────────────────┘

When I validate the data file (or a standalone resource) the check is properly skipped:

frictionless validate --skip-errors "blank-label" https://raw.githubusercontent.com/splor-mg/reprex/main/reprex/20231228T143527/data.csv
─────────────────────────────────────────────────────── Dataset ────────────────────────────────────────────────────────
              dataset               
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type  ┃ path     ┃ status ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━┩
│ data │ table │ data.csv │ VALID  │
└──────┴───────┴──────────┴────────┘

fjuniorr avatar Apr 30 '24 12:04 fjuniorr

Good catch. I'll change the title of the ticket to reflect this.

diego-oncoramedical avatar Apr 30 '24 16:04 diego-oncoramedical