csvs-to-sqlite
csvs-to-sqlite copied to clipboard
Optionally maintain checksums of CSV files for faster updates
Wanted to see if there is interest in a patch that helps speed up our workflows significantly, or if there are any further ideas for improving on such a feature. If this is out of scope for this project, I'm happy to continue maintaining my fork of this project.
Use Case
We currently maintain a folder of >200 CSV files with a total of a few hundred megabytes, and have a CI step that builds these CSVs into a sqlite database. These CSV files get updated 2-3 times a day, but only small changes are made to them. Currently, running csvs-to-sqlite
with the --replace-tables
flag takes roughly 6-7 minutes, which is too long for our use case.
Solution
Add a --update-tables
flag that maintains a checksum hash of each CSV file in a table called .csvs-meta
(happy to change this or make it configurable), and only reads the csv and loads the dataframe if the checksum has changed.