djorm-ext-pgfulltext icon indicating copy to clipboard operation
djorm-ext-pgfulltext copied to clipboard

to_tsvector support for saving VectorField

Open john-parton opened this issue 11 years ago • 0 comments

Right now the only way to update a VectorField is to write out a literal tsvector, e.g. 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12' or 'a:1A fat:2B,4C cat:5D'. While this is obviously very flexible, most of the time I just want to call to_tsvector.

Here's a quick overview of the problem:

# search/models.py
from django.db import models

from djorm_pgfulltext.fields import VectorField
from djorm_pgfulltext.models import SearchManager 

class SearchTest(models.Model):    
    search_index = VectorField()
    objects = SearchManager()
In [1]: from search.models import SearchTest

In [2]: search_test = SearchTest()

In [3]: search_test.search_index = 'swim swimming swam'

In [4]: search_test.save()

In [5]: search_test = SearchTest.objects.get(id=search_test.id) # Reload model instance

In [6]: search_test.search_index
Out[6]: "'swam' 'swim' 'swimming'" 
# The string was literally inserted as a ts vector
# I would rather it be converted and stemmed: "'swam':3 'swim':1,2"

I think that inserting a string as a literal tsvector is a fine default, but I still needed to be able to call to_tsvector. I got around that by creating a special python object and registering an adapter with psycopg2.

# search/tsvector.py
from psycopg2.extensions import adapt, AsIs

class TsVector(object):
    """ Represents a call to to_tsvector at the database level.

        Use:        
            TsVector('swim swimming swam'),
            TsVector('simple', 'swim swimming swam')
            TsVector('english', 'swim swimming swam')
    """
    def __init__(self, *args):
        assert len(args) in (1, 2), "Arguments should be TsVector([ config regconfig, ] document text)"

        if len(args) == 1:
            self.config = None
            self.document = args[0]
        else:
            self.config = args[0]
            self.document = args[1]

def adapt_tsvector(tsvector):
    """ Adapts TsVector object for use in DB.
    """
    if tsvector.config is None:
        return AsIs("to_tsvector(%s)" % adapt(tsvector.document))
    else:
        return AsIs("to_tsvector(%s, %s)" % (adapt(tsvector.config), adapt(tsvector.document)))

Put this somewhere that it'll only get executed once. With Django 1.7, an AppConfig is a pretty natural place to put it.

# search/apps.py
from django.apps import AppConfig

from psycopg2.extensions import register_adapter

from search.tsvector import TsVector, adapt_tsvector

class SearchConfig(AppConfig):
    name = 'search'
    verbose_name = "Search"

    def ready(self):
        # Register the TsVector class
        register_adapter(TsVector, adapt_tsvector)

Example use:


In [1]: from search.models import SearchTest

In [2]: from search.tsvector import TsVector

In [3]: search_test = SearchTest()

In [4]: search_test.search_index = TsVector('english', 'swim swimming swam')

In [5]: search_test.save()

In [6]: search_test = SearchTest.objects.get(id=search_test.id)

In [7]: search_test.search_index
Out[7]: "'swam':3 'swim':1,2"

Thoughts?

john-parton avatar Oct 21 '14 21:10 john-parton