ideas
ideas copied to clipboard
DataStore column name mapping
When importing a CSV into DataStore, the column names are set (by DataPusher/Express Loader) to be the same in the database as they are in the CSV.
This is problematic:
- In queries its confusing to have to double quote most column names
- Postgres doesn't allow unicode characters. So DataPusher/Express Loader can't upload them.
- Postgres limit on column names length is 63 characters
- Repeated column names
How about we get DataPusher/Express Loader to make some sensible changes to column names before it goes into the database, and then the mapping is saved for:
- users can see it in the web interface
- users can get it via an API
- users can edit it in the web interface
- subsequent DataPusher/Express Loader runs reuse the same mapping if possible
+1
We could also have datapusher and xloader populate the data dictionaries 'labels' with the original column names.
As for taking a bunch of column names and transforming them into cleaner versions how about adding an action that will "preview" the transformation. That lets sites potentially override it if they have different naming preferences or even revert to the old behaviour if they depend on it.
+1 on overriding unidecode()
. It is removing French accents from the column headers of our French datasets, which is not what we want. Any plans to allow this? (I left a comment here https://github.com/ckan/ckanext-xloader/issues/145).