djorm-ext-pgfulltext
djorm-ext-pgfulltext copied to clipboard
to_tsvector support for saving VectorField
Right now the only way to update a VectorField is to write out a literal tsvector, e.g. 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12' or 'a:1A fat:2B,4C cat:5D'. While this is obviously very flexible, most of the time I just want to call to_tsvector.
Here's a quick overview of the problem:
# search/models.py
from django.db import models
from djorm_pgfulltext.fields import VectorField
from djorm_pgfulltext.models import SearchManager
class SearchTest(models.Model):
search_index = VectorField()
objects = SearchManager()
In [1]: from search.models import SearchTest
In [2]: search_test = SearchTest()
In [3]: search_test.search_index = 'swim swimming swam'
In [4]: search_test.save()
In [5]: search_test = SearchTest.objects.get(id=search_test.id) # Reload model instance
In [6]: search_test.search_index
Out[6]: "'swam' 'swim' 'swimming'"
# The string was literally inserted as a ts vector
# I would rather it be converted and stemmed: "'swam':3 'swim':1,2"
I think that inserting a string as a literal tsvector is a fine default, but I still needed to be able to call to_tsvector. I got around that by creating a special python object and registering an adapter with psycopg2.
# search/tsvector.py
from psycopg2.extensions import adapt, AsIs
class TsVector(object):
""" Represents a call to to_tsvector at the database level.
Use:
TsVector('swim swimming swam'),
TsVector('simple', 'swim swimming swam')
TsVector('english', 'swim swimming swam')
"""
def __init__(self, *args):
assert len(args) in (1, 2), "Arguments should be TsVector([ config regconfig, ] document text)"
if len(args) == 1:
self.config = None
self.document = args[0]
else:
self.config = args[0]
self.document = args[1]
def adapt_tsvector(tsvector):
""" Adapts TsVector object for use in DB.
"""
if tsvector.config is None:
return AsIs("to_tsvector(%s)" % adapt(tsvector.document))
else:
return AsIs("to_tsvector(%s, %s)" % (adapt(tsvector.config), adapt(tsvector.document)))
Put this somewhere that it'll only get executed once. With Django 1.7, an AppConfig is a pretty natural place to put it.
# search/apps.py
from django.apps import AppConfig
from psycopg2.extensions import register_adapter
from search.tsvector import TsVector, adapt_tsvector
class SearchConfig(AppConfig):
name = 'search'
verbose_name = "Search"
def ready(self):
# Register the TsVector class
register_adapter(TsVector, adapt_tsvector)
Example use:
In [1]: from search.models import SearchTest
In [2]: from search.tsvector import TsVector
In [3]: search_test = SearchTest()
In [4]: search_test.search_index = TsVector('english', 'swim swimming swam')
In [5]: search_test.save()
In [6]: search_test = SearchTest.objects.get(id=search_test.id)
In [7]: search_test.search_index
Out[7]: "'swam':3 'swim':1,2"
Thoughts?