framework icon indicating copy to clipboard operation
framework copied to clipboard

Storing numbers in a string field in excel throws validation error

Open fjuniorr opened this issue 2 years ago • 0 comments

Overview

For the schema

    schema:
      fields:
        - name: a
          type: string

if the data is stored in excel and field a has only numbers, a validation error is raised. For example:

frictionless validate https://raw.githubusercontent.com/splor-mg/datapackage-reprex/main/reprex/20230731T181821/datapackage.yaml
────────────────────────────────────────────────────── Dataset ───────────────────────────────────────────────────────
                dataset                
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┓
┃ name  ┃ type  ┃ path      ┃ status  ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━┩
│ csv   │ table │ data.csv  │ VALID   │
│ excel │ table │ data.xlsx │ INVALID │
└───────┴───────┴───────────┴─────────┘
─────────────────────────────────────────────────────── Tables ───────────────────────────────────────────────────────
                                                        excel                                                         
┏━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row ┃ Field ┃ Type       ┃ Message                                                                                 ┃
┡━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2   │ 1     │ type-error │ Type error in the cell "1" in row "2" and field "a" at position "1": type is            │
│     │       │            │ "string/default"                                                                        │
│ 3   │ 1     │ type-error │ Type error in the cell "2" in row "3" and field "a" at position "1": type is            │
│     │       │            │ "string/default"                                                                        │
└─────┴───────┴────────────┴─────────────────────────────────────────────────────────────────────────────────────────┘

Also, the values get converted to None in read_rows:

from frictionless import Package

dp = Package('[datapackage.yaml](https://raw.githubusercontent.com/splor-mg/datapackage-reprex/main/reprex/20230731T181821/datapackage.yaml)')
print(dp.get_resource('excel').read_rows())
[{'a': None, 'b': 1}, {'a': None, 'b': 2}]

Changing the field type to any stops the validation error, but I no longer get the proper typing which is problematic downstream.

Here is the reprex.

fjuniorr avatar Jul 31 '23 21:07 fjuniorr