framework
framework copied to clipboard
Storing numbers in a string field in excel throws validation error
Overview
For the schema
schema:
fields:
- name: a
type: string
if the data is stored in excel and field a has only numbers, a validation error is raised. For example:
frictionless validate https://raw.githubusercontent.com/splor-mg/datapackage-reprex/main/reprex/20230731T181821/datapackage.yaml
────────────────────────────────────────────────────── Dataset ───────────────────────────────────────────────────────
dataset
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━┩
│ csv │ table │ data.csv │ VALID │
│ excel │ table │ data.xlsx │ INVALID │
└───────┴───────┴───────────┴─────────┘
─────────────────────────────────────────────────────── Tables ───────────────────────────────────────────────────────
excel
┏━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row ┃ Field ┃ Type ┃ Message ┃
┡━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2 │ 1 │ type-error │ Type error in the cell "1" in row "2" and field "a" at position "1": type is │
│ │ │ │ "string/default" │
│ 3 │ 1 │ type-error │ Type error in the cell "2" in row "3" and field "a" at position "1": type is │
│ │ │ │ "string/default" │
└─────┴───────┴────────────┴─────────────────────────────────────────────────────────────────────────────────────────┘
Also, the values get converted to None in read_rows:
from frictionless import Package
dp = Package('[datapackage.yaml](https://raw.githubusercontent.com/splor-mg/datapackage-reprex/main/reprex/20230731T181821/datapackage.yaml)')
print(dp.get_resource('excel').read_rows())
[{'a': None, 'b': 1}, {'a': None, 'b': 2}]
Changing the field type to any stops the validation error, but I no longer get the proper typing which is problematic downstream.
Here is the reprex.