-c should be able to merge multiple columns into a single table
Consider this for example: https://san-francisco.datasettes.com/registered-business-locations-3d50679/Registered_Business_Locations_-_San_Francisco
"NAICS Code" and " NAICS Code Description" are two columns that always reflect each other.
The existing syntax may be enough to support this already:
-c "NAICS Code:NAICS:Code" \
-c "NAICS Code Description:NAICS:Description"
This could create a table called NAICS with three columns: id, Code and Description.
Extraction would happen based on unique (Code, Description) pairs - so in the case of any of those pairs differing a new row would be added to the lookup table.
Open question: which of the original columns should stay and become the reference to the lookup table and which ones should be removed?
Simplest answer is the first one in command option order stays and becomes the foreign key column.
Would this issue also include handling duplicate columns created by csvkit? I have a fair amount of data which handles multiple values by duplicating the column name, which results in a lot of columns like “Location”, “Location_2”, etc. It'd be awesome if there was a way to say something like “Location_\d+:Location”.
This is now possible using a separate call to the sqlite-utils extract command: https://simonwillison.net/2020/Sep/23/sqlite-utils-extract/