sqlite-utils icon indicating copy to clipboard operation
sqlite-utils copied to clipboard

Document how to use `PRAGMA temp_store` to avoid errors when running VACUUM against huge databases

Open rayvoelker opened this issue 3 years ago • 2 comments

I'm trying to figure out a way to get the table.extract() method to complete successfully -- I'm not sure if maybe the cause (and a possible solution) of this on Ubuntu Server 22.04 is to adjust some of the PRAGMA values within SQLite itself ... on another Linux system (PopOS), using this method on this same database appears to work just fine.

Here's the bit that's causing the error, and the resulting error output:

# combine these columns into 1 table "bib_properties" :
# best_title
# bib_level_code
# mat_type
# material_code
# best_author
db["circ_trans"].extract(
    ["best_title", "bib_level_code", "mat_type", "material_code", "best_author"], 
    table="bib_properties", 
    fk_column="bib_properties_id"
)

db["circ_trans"].extract(
    ["call_number"], 
    table="call_number", 
    fk_column="call_number_id",
    rename={"call_number": "value"}
)
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
Input In [17], in <cell line: 7>()
      1 # combine these columns into 1 table "bib_properties" :
      2 # best_title
      3 # bib_level_code
      4 # mat_type
      5 # material_code
      6 # best_author
----> 7 db["circ_trans"].extract(
      8     ["best_title", "bib_level_code", "mat_type", "material_code", "best_author"], 
      9     table="bib_properties", 
     10     fk_column="bib_properties_id"
     11 )
     13 db["circ_trans"].extract(
     14     ["call_number"], 
     15     table="call_number", 
     16     fk_column="call_number_id",
     17     rename={"call_number": "value"}
     18 )

File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1764, in Table.extract(self, columns, table, fk_column, rename)
   1761         column_order.append(c.name)
   1763 # Drop the unnecessary columns and rename lookup column
-> 1764 self.transform(
   1765     drop=set(columns),
   1766     rename={magic_lookup_column: fk_column},
   1767     column_order=column_order,
   1768 )
   1770 # And add the foreign key constraint
   1771 self.add_foreign_key(fk_column, table, "id")

File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1526, in Table.transform(self, types, rename, drop, pk, not_null, defaults, drop_foreign_keys, column_order)
   1524 with self.db.conn:
   1525     for sql in sqls:
-> 1526         self.db.execute(sql)
   1527     # Run the foreign_key_check before we commit
   1528     if pragma_foreign_keys_was_on:

File ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:465, in Database.execute(self, sql, parameters)
    463     return self.conn.execute(sql, parameters)
    464 else:
--> 465     return self.conn.execute(sql)

OperationalError: database or disk is full

This database is about 17G in total size, so I'm assuming the error is coming from the vacuum ... where i'm assuming it's maybe trying to do the temp storage in a location that doesn't have sufficient room. The disk space is more than ample on the host in question (1.8T is free in the directory where the sqlite db resides) The /tmp directory however is limited on a smaller disk associated with the OS

I'm trying to think if there's a way to set the PRAGMA temp_store or maybe if it's temp_store_directory that I'm after ... to use the same local directory of where the file is located (maybe this is a property of the version of sqlite on the system?)

# SET the temp file store to be a file ...
print(db.execute('PRAGMA temp_store').fetchall())
print(db.execute('PRAGMA temp_store=FILE').fetchall())

print(db.execute('PRAGMA temp_store').fetchall())

# the users home directory ...
print(db.execute("PRAGMA temp_store_directory='/home/plchuser/'").fetchall())
print(db.execute("PRAGMA sqlite3_temp_directory='/home/plchuser/'").fetchall())

print(db.execute("PRAGMA temp_store_directory").fetchall())
print(db.execute("PRAGMA sqlite3_temp_directory").fetchall())
[(1,)]
[]
[(1,)]
[]
[]
[('/home/plchuser/',)]
[]

Here's the docs on the Temporary File Storage Locations https://www.sqlite.org/tempfiles.html

rayvoelker avatar May 03 '22 13:05 rayvoelker

So, the good news is that it appears that setting one of those PRAGMA statements fixed the issue of table.extract() method call on this large database completing (that I described above.) The bad news is that I'm not sure which one!

I wonder if it's something system / environment specific about SQLite, or maybe something else going on.

rayvoelker avatar May 03 '22 17:05 rayvoelker

It looks like PRAGMA temp_store was the right option to use here: https://www.sqlite.org/pragma.html#pragma_temp_store

temp_store_directory is listed as deprecated here: https://www.sqlite.org/pragma.html#pragma_temp_store_directory

I'm going to turn this into a help-wanted documentation issue.

simonw avatar Jun 14 '22 23:06 simonw