csvs-to-sqlite icon indicating copy to clipboard operation
csvs-to-sqlite copied to clipboard

-c should be able to merge multiple columns into a single table

Open simonw opened this issue 8 years ago • 4 comments

Consider this for example: https://san-francisco.datasettes.com/registered-business-locations-3d50679/Registered_Business_Locations_-_San_Francisco

"NAICS Code" and " NAICS Code Description" are two columns that always reflect each other.

simonw avatar Nov 26 '17 06:11 simonw

The existing syntax may be enough to support this already:

-c "NAICS Code:NAICS:Code" \
-c "NAICS Code Description:NAICS:Description"

This could create a table called NAICS with three columns: id, Code and Description.

Extraction would happen based on unique (Code, Description) pairs - so in the case of any of those pairs differing a new row would be added to the lookup table.

simonw avatar Nov 26 '17 06:11 simonw

Open question: which of the original columns should stay and become the reference to the lookup table and which ones should be removed?

Simplest answer is the first one in command option order stays and becomes the foreign key column.

simonw avatar Nov 26 '17 06:11 simonw

Would this issue also include handling duplicate columns created by csvkit? I have a fair amount of data which handles multiple values by duplicating the column name, which results in a lot of columns like “Location”, “Location_2”, etc. It'd be awesome if there was a way to say something like “Location_\d+:Location”.

acdha avatar Jul 10 '18 15:07 acdha

This is now possible using a separate call to the sqlite-utils extract command: https://simonwillison.net/2020/Sep/23/sqlite-utils-extract/

simonw avatar Nov 09 '20 07:11 simonw