sqlite-utils icon indicating copy to clipboard operation
sqlite-utils copied to clipboard

extracts= should support multiple-column extracts

Open simonw opened this issue 6 years ago • 10 comments

Lookup tables can be constructed on compound columns, but the extracts= option doesn't currently support that.

Right now extracts can be defined in two ways:

# Extract these columns into tables with the same name:
dogs = db.table("dogs", extracts=["breed", "most_recent_trophy"])

# Same as above but with custom table names:
dogs = db.table("dogs", extracts={"breed": "Breeds", "most_recent_trophy": "Trophies"})

Need some kind of syntax for much more complicated extractions, like when two columns (say "source" and "source_version") are extracted into a single table.

simonw avatar Jul 24 '19 07:07 simonw

This syntax should be shared with #42 as much as possible.

Maybe something based on a namedtuple would work, since those are already used in the library.

workouts = db.table("workouts", extracts=[Extract(
    columns=["source", "source_version"],
    table="Sources"
)])

Since namedtuples cannot have default values this should probably be a class instead. Actually it looks like there is a trick for defaults here: https://stackoverflow.com/a/18348004

simonw avatar Jul 24 '19 07:07 simonw

The new .extract() method can handle multiple columns:

https://github.com/simonw/sqlite-utils/blob/2c541fac352632e23e40b0d21e3f233f7a744a57/tests/test_extract.py#L70-L87

simonw avatar Oct 16 '20 18:10 simonw

Using a tuple would work:

fresh_db.table("tree", extracts=[("common_name", "latin_name")])

Or to define a custom name:

fresh_db.table("tree", extracts={("common_name", "latin_name"): "names"})

simonw avatar Oct 16 '20 18:10 simonw

I wonder if there's value in extending the extracts= option at all given the existence of table.extract().

simonw avatar Oct 16 '20 18:10 simonw

I think there is. It's a nice existing feature, and I don't think adding tuple support to it would be a huge lift.

simonw avatar Oct 16 '20 18:10 simonw

Here's the most complex example of .extracts():

db["Trees"].extract(
    ["CommonName", "LatinName"],
    table="Species",
    fk_column="species_id",
    rename={"CommonName": "name", "LatinName": "latin"}
)

Resulting in:

CREATE TABLE [Species] (
    [id] INTEGER PRIMARY KEY,
    [name] TEXT,
    [latin] TEXT
)

From https://sqlite-utils.readthedocs.io/en/stable/python-api.html#extracting-columns-into-a-separate-table

simonw avatar Oct 16 '20 18:10 simonw

If I want to support that most complicated example, I think the option to pass a Extracts() object to extracts= is the best way to do it:

fresh_db.table("tree", extracts=[Extract(
    columns=("CommonName", "LatinName"),
    table="Species",
    fk_column="species_id",
    rename={"CommonName": "name", "LatinName": "latin"}
)])

simonw avatar Oct 16 '20 18:10 simonw

But this begins to feel too complicated, given that table.extract() can already be used to achieve the same thing.

simonw avatar Oct 16 '20 18:10 simonw

I'm not going to implement this. I'll leave extract=... as it is right now, suitable for quick simple single-column operations on input, but if users want to do something more complicated involving multiple columns they should use the table.extract() method after the initial insert instead.

simonw avatar Oct 16 '20 18:10 simonw

Reconsidering: #89 was a feature request that relates to this, so maybe this is worth implementing after all.

simonw avatar Oct 16 '20 19:10 simonw