sqlite-utils
sqlite-utils copied to clipboard
.extract() shouldn't extract null values
This almost works, but it creates a rogue type record with a value of None.
In [1]: import sqlite_utils
In [2]: db = sqlite_utils.Database(memory=True)
In [5]: db["creatures"].insert_all([
{"id": 1, "name": "Simon", "type": None},
{"id": 2, "name": "Natalie", "type": None},
{"id": 3, "name": "Cleo", "type": "dog"}], pk="id")
Out[5]: <Table creatures (id, name, type)>
In [7]: db["creatures"].extract("type")
Out[7]: <Table creatures (id, name, type_id)>
In [8]: list(db["creatures"].rows)
Out[8]:
[{'id': 1, 'name': 'Simon', 'type_id': None},
{'id': 2, 'name': 'Natalie', 'type_id': None},
{'id': 3, 'name': 'Cleo', 'type_id': 2}]
In [9]: db["type"]
Out[9]: <Table type (id, type)>
In [10]: list(db["type"].rows)
Out[10]: [{'id': 1, 'type': None}, {'id': 2, 'type': 'dog'}]
How should this work with extractions covering multiple columns?
If there's a single column then it makes sense that a null value would not be extracted into the lookup table, but would instead become stay as null.
For a multiple column extraction, provided at least one of those columns is not null It should map to a record in the lookup table. Only if ALL of the extracted columns are null should the lookup value stay null.
The alternative solution here would be that a single null value DOES get extracted. To implement this I would need to add some logic that uses is null instead of =.
Either way I think I'm going to need to add some SQL which uses where a = b or (a is null and b is null).
Actually I think this should be an option to .extract() which controls if nulls are extracted or left alone. Maybe called extract_nulls=True/False.
Failing test:
def test_extract_null_values(fresh_db):
fresh_db["species"].insert({"id": 1, "species": "Wolf"}, pk="id")
fresh_db["individuals"].insert_all(
[
{"id": 10, "name": "Terriana", "species": "Fox"},
{"id": 11, "name": "Spenidorm", "species": None},
{"id": 12, "name": "Grantheim", "species": "Wolf"},
{"id": 13, "name": "Turnutopia", "species": None},
{"id": 14, "name": "Wargal", "species": "Wolf"},
],
pk="id",
)
fresh_db["individuals"].extract("species")
assert fresh_db["species"].schema == (
"CREATE TABLE [species] (\n"
" [id] INTEGER PRIMARY KEY,\n"
" [species] TEXT\n"
")"
)
assert fresh_db["individuals"].schema == (
'CREATE TABLE "individuals" (\n'
" [id] INTEGER PRIMARY KEY,\n"
" [name] TEXT,\n"
" [species_id] INTEGER,\n"
" FOREIGN KEY(species_id) REFERENCES species(id)\n"
")"
)
assert list(fresh_db["species"].rows) == [
{"id": 1, "species": "Wolf"},
{"id": 2, "species": "Fox"},
]
assert list(fresh_db["individuals"].rows) == [
{"id": 10, "name": "Terriana", "species_id": 2},
{"id": 11, "name": "Spenidorm", "species_id": None},
{"id": 12, "name": "Grantheim", "species_id": 1},
{"id": 13, "name": "Turnutopia", "species_id": None},
{"id": 14, "name": "Wargal", "species_id": 1},
]
I think I ran into an issue that's perhaps related with extract()
I have a case where I want to create a lookup table for all the related title data where there are possibly multiple null values in the related columns ....
sql = """\
INSERT INTO "circulation_info" ("item_id", "bib_title", "bib_creator", "bib_format", "bib_pub_year", "checkout_date")
VALUES
(1, "title one", "creator one", "Book", 2018, "2021-08-12 00:01"),
(2, "title two", "creator one", "Book", 2019, "2021-08-12 00:02"),
(3, "title three", NULL, "DVD", 2020, "2021-08-12 00:03"),
(4, "title four", NULL, "DVD", NULL, "2021-08-12 00:04"),
(5, "title five", NULL, "DVD", NULL, "2021-08-12 00:05")
"""
with sqlite3.connect('test_bib.db') as con:
con.execute(sql)
when I run the extract() method ...
db["circulation_info"].extract(
[
"bib_title",
"bib_creator",
"bib_format"
],
table="bib_info",
fk_column="bib_info_id"
)
db = sqlite_utils.Database("test_bib.db")
for row in db["circulation_info"].rows:
print(row)
print("\n---\n")
for row in db["bib_info"].rows:
print(row)
results in this ..
{'id': 1, 'item_id': 1, 'bib_info_id': 1, 'bib_pub_year': 2018, 'checkout_date': '2021-08-12 00:01'}
{'id': 2, 'item_id': 2, 'bib_info_id': 2, 'bib_pub_year': 2019, 'checkout_date': '2021-08-12 00:02'}
{'id': 3, 'item_id': 3, 'bib_info_id': None, 'bib_pub_year': 2020, 'checkout_date': '2021-08-12 00:03'}
{'id': 4, 'item_id': 4, 'bib_info_id': None, 'bib_pub_year': None, 'checkout_date': '2021-08-12 00:04'}
{'id': 5, 'item_id': 5, 'bib_info_id': None, 'bib_pub_year': None, 'checkout_date': '2021-08-12 00:05'}
---
{'id': 1, 'bib_title': 'title one', 'bib_creator': 'creator one', 'bib_format': 'Book'}
{'id': 2, 'bib_title': 'title two', 'bib_creator': 'creator one', 'bib_format': 'Book'}
{'id': 3, 'bib_title': 'title three', 'bib_creator': None, 'bib_format': 'DVD'}
{'id': 4, 'bib_title': 'title four', 'bib_creator': None, 'bib_format': 'DVD'}
{'id': 5, 'bib_title': 'title five', 'bib_creator': None, 'bib_format': 'DVD'}
Seems like it's correctly generating the row data for those lookups, but it's not correctly updating the foreign key back to the primary table? Looks like it just results in a NULL value in that original table.
Any ideas on why? Thanks again!
Actually, I forgot to include the bib_pub_year in the extract ...
But also, I tried again with empty string values instead of NULL values and it seems to place the foreign key properly / correctly...
sql = """\
INSERT INTO "circulation_info" ("item_id", "bib_title", "bib_creator", "bib_format", "bib_pub_year", "checkout_date")
VALUES
(1, "title one", "creator one", "Book", 2018, "2021-08-12 00:01"),
(2, "title two", "creator one", "Book", 2019, "2021-08-12 00:02"),
(3, "title three", "", "DVD", 2020, "2021-08-12 00:03"),
(4, "title four", "", "DVD", "", "2021-08-12 00:04"),
(5, "title five", "", "DVD", "", "2021-08-12 00:05")
"""
with sqlite3.connect('test_bib_2.db') as con:
con.execute(sql)
db["circulation_info"].extract(
[
"bib_title",
"bib_creator",
"bib_format",
"bib_pub_year"
],
table="bib_info",
fk_column="bib_info_id"
)
{'id': 1, 'item_id': 1, 'bib_info_id': 1, 'bib_pub_year': 2018, 'checkout_date': '2021-08-12 00:01'}
{'id': 2, 'item_id': 2, 'bib_info_id': 2, 'bib_pub_year': 2019, 'checkout_date': '2021-08-12 00:02'}
{'id': 3, 'item_id': 3, 'bib_info_id': 3, 'bib_pub_year': 2020, 'checkout_date': '2021-08-12 00:03'}
{'id': 4, 'item_id': 4, 'bib_info_id': 4, 'bib_pub_year': '', 'checkout_date': '2021-08-12 00:04'}
{'id': 5, 'item_id': 5, 'bib_info_id': 5, 'bib_pub_year': '', 'checkout_date': '2021-08-12 00:05'}
---
{'id': 1, 'bib_title': 'title one', 'bib_creator': 'creator one', 'bib_format': 'Book'}
{'id': 2, 'bib_title': 'title two', 'bib_creator': 'creator one', 'bib_format': 'Book'}
{'id': 3, 'bib_title': 'title three', 'bib_creator': '', 'bib_format': 'DVD'}
{'id': 4, 'bib_title': 'title four', 'bib_creator': '', 'bib_format': 'DVD'}
{'id': 5, 'bib_title': 'title five', 'bib_creator': '', 'bib_format': 'DVD'}