Creating tables with custom datatypes
Via https://stackoverflow.com/a/18622264/454773 I note the ability to register custom handlers for novel datatypes that can map into and out of things like sqlite BLOBs.
From a quick look and a quick play, I didn't spot a way to do this in sqlite_utils?
For example:
# Via https://stackoverflow.com/a/18622264/454773
import sqlite3
import numpy as np
import io
def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return np.load(out)
# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)
# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)
from sqlite_utils import Database
db = Database('test.db')
# Reset the database connection to used the parsed datatype
# sqlite_utils doesn't seem to support eg:
# Database('test.db', detect_types=sqlite3.PARSE_DECLTYPES)
db.conn = sqlite3.connect(db_name, detect_types=sqlite3.PARSE_DECLTYPES)
# Create a table the old fashioned way
# but using the new custom data type
vector_table_create = """
CREATE TABLE dummy
(title TEXT, vector array );
"""
cur = db.conn.cursor()
cur.execute(vector_table_create)
# sqlite_utils doesn't appear to support custom types (yet?!)
# The following errors on the "array" datatype
"""
db["dummy"].create({
"title": str,
"vector": "array",
})
"""
We can then add / retrieve records from the database where the datatype of the vector field is a custom registered array type (which is to say, a numpy array):
import numpy as np
db["dummy"].insert({'title':"test1", 'vector':np.array([1,2,3])})
for row in db.query("SELECT * FROM dummy"):
print(row['title'], row['vector'], type(row['vector']))
"""
test1 [1 2 3] <class 'numpy.ndarray'>
"""
It would be handy to be able to do this idiomatically in sqlite_utils.
I had no idea this was possible! I guess SQLite will allow any text string as the column type, defaulting to TEXT as the underlying default representation if it doesn't recognize the type.
Allowing custom strings in the create() method, as you suggest in your example, feels like a reasonable way to support this.
db["dummy"].create({
"title": str,
"vector": "array",
})
I'm slightly nervous about that just because people might accidentally use this without realizig what they are doing - passing "column-name": "string" for example when they should have used "column-name": str in order to get a TEXT column.
Alternatively, this could work:
db["dummy"].create({
"title": str,
"vector": CustomColumnType("array")
})
This would play better with mypy too I think.
The CustomColumnType() approach looks good. This pushes you into the mindspace that you are defining and working with a custom column type.
When creating the table, you could then error, or at least warn, if someone wasn't setting a column on a type or a custom column type, which I guess is where mypy comes in?
Wondering if this actually relates to https://github.com/simonw/sqlite-utils/issues/402 ?
I also wonder if this would be a sensible approach for eg registering pint based quantity conversions into and out of the db, perhaps storing the quantity as a serialised magnitude measurement single column string?
I was wondering if you have any more thoughts on this? I have a tangible use case now: adding a "vector" column to a database to support semantic search using doc2vec embeddings (example; note that the vtfunc package may no longer be reliable...).
See also: https://github.com/simonw/sqlite-utils/issues/612