framework icon indicating copy to clipboard operation
framework copied to clipboard

Frictionless validation/query fails for multipart gzipped resources

Open peterdesmet opened this issue 2 years ago • 0 comments

Overview

A resource consisting of multiple compressed csv-files results in an error when trying to validate or query with Frictionless Framework. It works fine when path is a single compressed resource.

Dataset

datapackage.json.zip:

{
  "profile": "tabular-data-package",
  "resources": [
    {
      "name": "gps",
      "path": [
        "https://zenodo.org/record/5653311/files/O_ASSEN-gps-2018.csv.gz",
        "https://zenodo.org/record/5653311/files/O_ASSEN-gps-2019.csv.gz"
      ],
      "profile": "tabular-data-resource",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "UTF-8",
      "schema": {
        "fields": [ ... ]
      }
    }
  ]
}

Validate

frictionless validate datapackage.json
                                               dataset                                                 
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ type  ┃ path                                                                        ┃ status  ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ gps  │ table │ https://zenodo.org/record/5653311/files/O_ASSEN-gps-2018.csv.gz (multipart) │ INVALID │
└──────┴───────┴─────────────────────────────────────────────────────────────────────────────┴─────────┘
─────────────────────────────────────────────────────────────────────────────────────────── Tables ────────────────────────────────────────────────────────────────────────────────────────────
                                                                                              gps                                                                                              
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row  ┃ Field ┃ Type              ┃ Message                                                                                                                                                  ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ None │ None  │ compression-error │ The data source could not be successfully decompressed: Not a gzipped file                                                                               │
│      │       │                   │ (b'event-id,visible,timestamp,location-long,location-lat,bar:barometric-pressure,external-temperature,gps:dop,gps:satellite-count,gps-time-to-fix,groun… │
└──────┴───────┴───────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Query

Issue #1477 has returned:

frictionless query https://zenodo.org/record/5653311
──────────────────────────────────────────────────────────────────────────────────────────── Index ────────────────────────────────────────────────────────────────────────────────────────────
[reference-data] Indexed 1954 bytes in 2.381 seconds
[gps] errored
[acceleration] errored

peterdesmet avatar Aug 11 '23 10:08 peterdesmet