sqlite-utils
sqlite-utils copied to clipboard
extracts= should support multiple-column extracts
Lookup tables can be constructed on compound columns, but the extracts= option doesn't currently support that.
Right now extracts can be defined in two ways:
# Extract these columns into tables with the same name:
dogs = db.table("dogs", extracts=["breed", "most_recent_trophy"])
# Same as above but with custom table names:
dogs = db.table("dogs", extracts={"breed": "Breeds", "most_recent_trophy": "Trophies"})
Need some kind of syntax for much more complicated extractions, like when two columns (say "source" and "source_version") are extracted into a single table.
This syntax should be shared with #42 as much as possible.
Maybe something based on a namedtuple would work, since those are already used in the library.
workouts = db.table("workouts", extracts=[Extract(
columns=["source", "source_version"],
table="Sources"
)])
Since namedtuples cannot have default values this should probably be a class instead. Actually it looks like there is a trick for defaults here: https://stackoverflow.com/a/18348004
The new .extract() method can handle multiple columns:
https://github.com/simonw/sqlite-utils/blob/2c541fac352632e23e40b0d21e3f233f7a744a57/tests/test_extract.py#L70-L87
Using a tuple would work:
fresh_db.table("tree", extracts=[("common_name", "latin_name")])
Or to define a custom name:
fresh_db.table("tree", extracts={("common_name", "latin_name"): "names"})
I wonder if there's value in extending the extracts= option at all given the existence of table.extract().
I think there is. It's a nice existing feature, and I don't think adding tuple support to it would be a huge lift.
Here's the most complex example of .extracts():
db["Trees"].extract(
["CommonName", "LatinName"],
table="Species",
fk_column="species_id",
rename={"CommonName": "name", "LatinName": "latin"}
)
Resulting in:
CREATE TABLE [Species] (
[id] INTEGER PRIMARY KEY,
[name] TEXT,
[latin] TEXT
)
From https://sqlite-utils.readthedocs.io/en/stable/python-api.html#extracting-columns-into-a-separate-table
If I want to support that most complicated example, I think the option to pass a Extracts() object to extracts= is the best way to do it:
fresh_db.table("tree", extracts=[Extract(
columns=("CommonName", "LatinName"),
table="Species",
fk_column="species_id",
rename={"CommonName": "name", "LatinName": "latin"}
)])
But this begins to feel too complicated, given that table.extract() can already be used to achieve the same thing.
I'm not going to implement this. I'll leave extract=... as it is right now, suitable for quick simple single-column operations on input, but if users want to do something more complicated involving multiple columns they should use the table.extract() method after the initial insert instead.
Reconsidering: #89 was a feature request that relates to this, so maybe this is worth implementing after all.