sqlite-utils
sqlite-utils copied to clipboard
Document how to use `PRAGMA temp_store` to avoid errors when running VACUUM against huge databases
I'm trying to figure out a way to get the table.extract() method to complete successfully -- I'm not sure if maybe the cause (and a possible solution) of this on Ubuntu Server 22.04 is to adjust some of the PRAGMA values within SQLite itself ... on another Linux system (PopOS), using this method on this same database appears to work just fine.
Here's the bit that's causing the error, and the resulting error output:
# combine these columns into 1 table "bib_properties" :
# best_title
# bib_level_code
# mat_type
# material_code
# best_author
db["circ_trans"].extract(
["best_title", "bib_level_code", "mat_type", "material_code", "best_author"],
table="bib_properties",
fk_column="bib_properties_id"
)
db["circ_trans"].extract(
["call_number"],
table="call_number",
fk_column="call_number_id",
rename={"call_number": "value"}
)
---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
Input In [17], in <cell line: 7>()
1 # combine these columns into 1 table "bib_properties" :
2 # best_title
3 # bib_level_code
4 # mat_type
5 # material_code
6 # best_author
----> 7 db["circ_trans"].extract(
8 ["best_title", "bib_level_code", "mat_type", "material_code", "best_author"],
9 table="bib_properties",
10 fk_column="bib_properties_id"
11 )
13 db["circ_trans"].extract(
14 ["call_number"],
15 table="call_number",
16 fk_column="call_number_id",
17 rename={"call_number": "value"}
18 )
File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1764, in Table.extract(self, columns, table, fk_column, rename)
1761 column_order.append(c.name)
1763 # Drop the unnecessary columns and rename lookup column
-> 1764 self.transform(
1765 drop=set(columns),
1766 rename={magic_lookup_column: fk_column},
1767 column_order=column_order,
1768 )
1770 # And add the foreign key constraint
1771 self.add_foreign_key(fk_column, table, "id")
File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1526, in Table.transform(self, types, rename, drop, pk, not_null, defaults, drop_foreign_keys, column_order)
1524 with self.db.conn:
1525 for sql in sqls:
-> 1526 self.db.execute(sql)
1527 # Run the foreign_key_check before we commit
1528 if pragma_foreign_keys_was_on:
File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:465, in Database.execute(self, sql, parameters)
463 return self.conn.execute(sql, parameters)
464 else:
--> 465 return self.conn.execute(sql)
OperationalError: database or disk is full
This database is about 17G in total size, so I'm assuming the error is coming from the vacuum ... where i'm assuming it's maybe trying to do the temp storage in a location that doesn't have sufficient room. The disk space is more than ample on the host in question (1.8T is free in the directory where the sqlite db resides) The /tmp directory however is limited on a smaller disk associated with the OS
I'm trying to think if there's a way to set the PRAGMA temp_store or maybe if it's temp_store_directory that I'm after ... to use the same local directory of where the file is located (maybe this is a property of the version of sqlite on the system?)
# SET the temp file store to be a file ...
print(db.execute('PRAGMA temp_store').fetchall())
print(db.execute('PRAGMA temp_store=FILE').fetchall())
print(db.execute('PRAGMA temp_store').fetchall())
# the users home directory ...
print(db.execute("PRAGMA temp_store_directory='/home/plchuser/'").fetchall())
print(db.execute("PRAGMA sqlite3_temp_directory='/home/plchuser/'").fetchall())
print(db.execute("PRAGMA temp_store_directory").fetchall())
print(db.execute("PRAGMA sqlite3_temp_directory").fetchall())
[(1,)]
[]
[(1,)]
[]
[]
[('/home/plchuser/',)]
[]
Here's the docs on the Temporary File Storage Locations https://www.sqlite.org/tempfiles.html
So, the good news is that it appears that setting one of those PRAGMA statements fixed the issue of table.extract() method call on this large database completing (that I described above.) The bad news is that I'm not sure which one!
I wonder if it's something system / environment specific about SQLite, or maybe something else going on.
It looks like PRAGMA temp_store was the right option to use here: https://www.sqlite.org/pragma.html#pragma_temp_store
temp_store_directory is listed as deprecated here: https://www.sqlite.org/pragma.html#pragma_temp_store_directory
I'm going to turn this into a help-wanted documentation issue.