drizzle-orm icon indicating copy to clipboard operation
drizzle-orm copied to clipboard

Add Full-Text Search capability

Open cr101 opened this issue 1 year ago • 7 comments

This is a feature request for Full-Text Search functionality to perform a fulltext search on specific fields. Ideally with support for using GIN/GIST indexes to speed up full text search.

cr101 avatar Mar 11 '23 01:03 cr101

Could you show how it looks like in SQL?

dankochetov avatar Mar 12 '23 19:03 dankochetov

CREATE TABLE companies (
  id serial PRIMARY KEY,
  name VARCHAR ( 50 ) NOT NULL,
  description VARCHAR ( 255 ),
);

Using the "tsvector" column type for Postgres will speed up full text search.

Example using a precomputed tsvector:

ALTER TABLE "companies" ADD COLUMN  "name_vec" tsvector;
UPDATE companies SET name_vec = to_tsvector("english", name);
CREATE INDEX companies_name on companies USING GIN (name_vec);

select * from companies where name @@ to_tsquery('english', 'amazo:*')

The full range of Postgresql Text Search Functions and Operators

cr101 avatar Mar 17 '23 01:03 cr101

Are there useful workarounds using the querybuilder today?

paynecodes avatar Apr 04 '23 03:04 paynecodes

Started look at this a bit - at least for generating a tsvector column in a schema definition, doing something very similar to the citext example of an extended column type worked fine: https://gist.github.com/tmcw/9f0327c74e914524eed997131b96e66e

tmcw avatar May 15 '23 16:05 tmcw

@tmcw is there any way to also support generated specifier?

for example, I want to use drizzle to define this column:

my_column_name tsvector generated always as (to_tsvector('english', other_column_name)) stored

edit:

based on @tmcw i'm able to use custom config inside tsvector function, here is the implementation:

import { ColumnBuilderConfig, ColumnConfig } from "drizzle-orm";
import {
    AnyPgTable,
    PgColumn,
    PgColumnBuilder,
    PgColumnBuilderHKT,
    PgColumnHKT,
} from "drizzle-orm/pg-core";

export interface PgTSVectorConfig {
    sources?: string[];
}

export class PgTSVectorBuilder<
    TData extends string = string
> extends PgColumnBuilder<
    PgColumnBuilderHKT,
    ColumnBuilderConfig<{ data: TData; driverParam: string }>,
    { sources: string[] | undefined }
> {
    protected $pgColumnBuilderBrand: string = "PgTSVectorBuilder";

    constructor(name: string, config: PgTSVectorConfig) {
        super(name);
        this.config.sources = config.sources;
    }

    build<TTableName extends string>(
        table: AnyPgTable<{ name: TTableName }>
    ): PgTSVector<TTableName, TData> {
        return new PgTSVector(table, this.config);
    }
}

export class PgTSVector<
    TTableName extends string,
    TData extends string
> extends PgColumn<
    PgColumnHKT,
    ColumnConfig<{ tableName: TTableName; data: TData; driverParam: string }>,
    { sources: string[] | undefined }
> {
    constructor(
        table: AnyPgTable<{ name: TTableName }>,
        builder: PgTSVectorBuilder<TData>["config"]
    ) {
        super(table, builder);
    }

    getSQLType(): string {
        return this.config.sources === undefined
            ? `tsvector`
            : `tsvector generated always as (to_tsvector('english', ${this.config.sources.join(
                  " || ' ' || "
              )})) stored`;
    }
}

export function tsvector<TName extends string>(
    name: string,
    config: PgTSVectorConfig = {}
): PgTSVectorBuilder<TName> {
    return new PgTSVectorBuilder(name, config);
}

pyk avatar May 16 '23 12:05 pyk

Wow, that is the first I'm learning of the generated specifier 😆. So thanks for the tip, but yeah - not sure how to support that so far!

tmcw avatar May 16 '23 13:05 tmcw

Building upon above answer I added support for weight

import { ColumnBuilderConfig, ColumnConfig } from "drizzle-orm";
import {
  AnyPgTable,
  PgColumn,
  PgColumnBuilder,
  PgColumnBuilderHKT,
  PgColumnHKT,
} from "drizzle-orm/pg-core";

type PgTSVectorBuilderConfig = {
  sources: string[] | undefined;
  weighted?: boolean;
};

export interface PgTSVectorConfig {
  sources?: PgTSVectorBuilderConfig["sources"];
  weighted?: PgTSVectorBuilderConfig["weighted"];
}

function generateTsvectorColumn(input: string[]) {
  const columnExpressions = input.map((column, index) => {
    const weight = String.fromCharCode(index + 65);
    return `setweight(to_tsvector('english', coalesce(${column}, '')), '${weight}')`;
  });

  const tsvectorColumn = `tsvector GENERATED ALWAYS AS (${columnExpressions.join(
    " || "
  )}) STORED`;

  return tsvectorColumn;
}

export class PgTSVectorBuilder<
  TData extends string = string
> extends PgColumnBuilder<
  PgColumnBuilderHKT,
  ColumnBuilderConfig<{ data: TData; driverParam: string }>,
  PgTSVectorBuilderConfig
> {
  protected $pgColumnBuilderBrand: string = "PgTSVectorBuilder";

  constructor(name: string, config: PgTSVectorConfig) {
    super(name);
    this.config.sources = config.sources;
    this.config.weighted = config.weighted;
  }

  build<TTableName extends string>(
    table: AnyPgTable<{ name: TTableName }>
  ): PgTSVector<TTableName, TData> {
    const xyz = new PgTSVector(table, this.config);
    console.log(xyz);
    return xyz;
  }
}

export class PgTSVector<
  TTableName extends string,
  TData extends string
> extends PgColumn<
  PgColumnHKT,
  ColumnConfig<{ tableName: TTableName; data: TData; driverParam: string }>,
  { sources: string[] | undefined; weighted?: boolean }
> {
  constructor(
    table: AnyPgTable<{ name: TTableName }>,
    builder: PgTSVectorBuilder<TData>["config"]
  ) {
    super(table, builder);
  }

  getSQLType(): string {
    return this.config.sources === undefined
      ? `tsvector`
      : this.config.weighted
      ? generateTsvectorColumn(this.config.sources)
      : `tsvector generated always as (to_tsvector('english', ${this.config.sources.join(
          " || ' ' || "
        )})) stored`;
  }
}

export function tsvector<TName extends string>(
  name: string,
  config: PgTSVectorConfig = {}
): PgTSVectorBuilder<TName> {
  return new PgTSVectorBuilder(name, config);
}

In your schema

...
vec: tsvector("info", {
      sources: ["title", "description"],
      weighted: true,
    }),
...

The only issue I'm getting is in the migration file. Double inverted commas before tsvector and at the end of the sql statement as well

Current output:

ALTER TABLE "table_name" ADD COLUMN "vec" "tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B')) STORED";

Expected output:

ALTER TABLE "table_name" ADD COLUMN "vec" tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B')) STORED;

xinha-sh avatar Jun 02 '23 10:06 xinha-sh

Any chance anyone's actively looking at adding indexed (GIN or GIST)generated ts_vector columns? This is one of the last things that will let us move away from Supabase and over to Drizzle entirely. Hoping eventually it can also be added to Drizzle-Kit's introspect command so we can grab our existing search columns.

juanvilladev avatar Jul 21 '23 19:07 juanvilladev

+1 GIN support. This would be absolutely killer.

Rykuno avatar Aug 01 '23 19:08 Rykuno

+1

arivera-xealth avatar Aug 05 '23 15:08 arivera-xealth

This should have a PostgreSQL label. tsvector is only available in PostgreSQL. MySql is different and there is already a PR open for it.

Angelelz avatar Aug 14 '23 18:08 Angelelz

+1

thebeepx avatar Aug 16 '23 00:08 thebeepx

+1

andersoncardoso avatar Sep 19 '23 14:09 andersoncardoso

For drizzle-orm 0.28.5 or later:

// tsvector.ts

import { customType } from "drizzle-orm/pg-core";

export const tsvector = customType<{
  data: string;
  config: { sources: string[] };
}>({
  dataType(config) {
    if (config) {
      const sources = config.sources.join(" || ' ' || ");
      return `tsvector generated always as (to_tsvector('english', ${sources})) stored`;
    } else {
      return `tsvector`;
    }
  },
});

example usage:

  // Search field
  fts: tsvector("fts", {
    sources: ["id", "twitter_username", "twitter_name"], // list of column names
  }),

pyk avatar Sep 20 '23 01:09 pyk

import { customType } from "drizzle-orm/pg-core";

export const tsvector = customType<{
  data: string;
  config: { sources: string[] };
}>({
  dataType(config) {
    if (config) {
      const sources = config.sources.join(" || ' ' || ");
      return `tsvector generated always as (to_tsvector('english', ${sources})) stored`;
    } else {
      return `tsvector`;
    }
  },
});

Thanks for the updated code. In case you need to adjust weights as well

import { customType } from "drizzle-orm/pg-core";

function genExpWithWeights(input: string[]) {
  const columnExpressions = input.map((column, index) => {
    const weight = String.fromCharCode(index + 65);
    return `setweight(to_tsvector('english', coalesce(${column}, '')), '${weight}')`;
  });

  const tsvectorColumn = `tsvector GENERATED ALWAYS AS (${columnExpressions.join(
    " || "
  )}) STORED`;

  return tsvectorColumn;
}

export const tsvector = customType<{
  data: string;
  config: { sources: string[]; weighted: boolean };
}>({
  dataType(config) {
    if (config) {
      const sources = config.sources.join(" || ' ' || ");
      return config.weighted
        ? genExpWithWeights(config.sources)
        : `tsvector generated always as (to_tsvector('english', ${sources})) stored`;
    } else {
      return `tsvector`;
    }
  },
});

xinha-sh avatar Sep 20 '23 07:09 xinha-sh

import { customType } from "drizzle-orm/pg-core";

export const tsvector = customType<{
  data: string;
  config: { sources: string[] };
}>({
  dataType(config) {
    if (config) {
      const sources = config.sources.join(" || ' ' || ");
      return `tsvector generated always as (to_tsvector('english', ${sources})) stored`;
    } else {
      return `tsvector`;
    }
  },
});

Thanks for the updated code. In case you need to adjust weights as well

import { customType } from "drizzle-orm/pg-core";

function genExpWithWeights(input: string[]) {
  const columnExpressions = input.map((column, index) => {
    const weight = String.fromCharCode(index + 65);
    return `setweight(to_tsvector('english', coalesce(${column}, '')), '${weight}')`;
  });

  const tsvectorColumn = `tsvector GENERATED ALWAYS AS (${columnExpressions.join(
    " || "
  )}) STORED`;

  return tsvectorColumn;
}

export const tsvector = customType<{
  data: string;
  config: { sources: string[]; weighted: boolean };
}>({
  dataType(config) {
    if (config) {
      const sources = config.sources.join(" || ' ' || ");
      return config.weighted
        ? genExpWithWeights(config.sources)
        : `tsvector generated always as (to_tsvector('english', ${sources})) stored`;
    } else {
      return `tsvector`;
    }
  },
});

I just tried this definition it doesn't work.

(error: type "tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', " does not exist)

Here is the sample column definition generated by the type:

--> statement-breakpoint
CREATE TABLE IF NOT EXISTS "post" (
 -- ...
	"fts_doc_mk" "tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B') || setweight(to_tsvector('english', coalesce(overview, '')), 'C')) STORED",
  -- ...
  )

callmeberzerker avatar Oct 01 '23 15:10 callmeberzerker

The problem is that the column definition is wrapped with double quotes. 🤔

callmeberzerker avatar Oct 01 '23 15:10 callmeberzerker

You're supposed to use the sql operator to generate sql and not escape it:

export const tsvector = customType<{
  data: string;
  config: { sources: string[]; weighted: boolean };
}>({
  dataType(config) {
    if (config) {
      const sources = config.sources.join(" || ' ' || ");
      return config.weighted
        ? genExpWithWeights(config.sources)
        : sql`tsvector generated always as (to_tsvector('english', ${sources})) stored`;
    } else {
      return sql`tsvector`;
    }
  },
});

You should do the same inside the function genExpWithWeights

But this should poiint you in the right direction.

Angelelz avatar Oct 01 '23 15:10 Angelelz

const sources = config.sources.join(" || ' ' || ");

Sadly that doesn't work on top of ts complaining with:

Type '({ columns, locale }: Config) => SQL<unknown>' is not assignable to type '(config: Config) => string'.
  Type 'SQL<unknown>' is not assignable to type 'string'.ts(2322)

I've tried playing around with setting sql<string>"tsvector GENERATED ALWAYS... " (or changing the data: SQLbut drizzle-kit simply ignores generating the schema even with the// @ts-ignore`.

EDIT: I tried the following thing it still doesn't work (the column definition is all quoted):

export const tsVector = customType<{
	data: string;
	config: Config;
	configRequired: true;
}>({
	dataType({ columns, locale }) {
		const language = resolveLocaleToLanguage(locale);
		const columnExpressions = columns.map((column, index) => {
			const weight = String.fromCharCode(index + 65);
			return `setweight(to_tsvector('${language}', coalesce(${column}, '')), '${weight}')`;
		});
		const tsvectorColumn = `tsvector GENERATED ALWAYS AS (${columnExpressions.join(
			' || '
		)}) STORED`;
		return tsvectorColumn;
	},
	toDriver(value: string) {
		return sql`${value}`;
	},
	fromDriver(value: unknown) {
		return value as string;
	}
});


callmeberzerker avatar Oct 01 '23 19:10 callmeberzerker

Any update from this? Could anyone make it work?

raikusy avatar Nov 29 '23 07:11 raikusy

Nothing new from my side. I resorted to using a custom migration script just for this field - and abandoning db push for prototyping which is pity.

callmeberzerker avatar Nov 29 '23 12:11 callmeberzerker

The actual problem is that the dataType function only accepts a string as the return type, no magic sql operator supported. Causing everything to be wrapped in double quotes. Currently was able to workaround by just changing the sql output manually, interestingly it works normally for subsequent generations.

mrherickz avatar Dec 27 '23 15:12 mrherickz

@Angelelz any update on the above? The sql magic operator doesn't work when returned from dataType as your example shows.

Error:

Type '({ columns, locale }: Config) => SQL<unknown>' is not assignable to type '(config: Config) => string'.
  Type 'SQL<unknown>' is not assignable to type 'string'.ts(2322)

ghunkins avatar Jan 16 '24 20:01 ghunkins

Yes, my solution was wrong. Unfortunately, the workaround is letting drizzle-kit generate the migration and then manually removing the double quotes around the type:

--> statement-breakpoint
CREATE TABLE IF NOT EXISTS "post" (
 -- ...
	"fts_doc_mk" tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B') || setweight(to_tsvector('english', coalesce(overview, '')), 'C')) STORED,
  -- ...
  )

Angelelz avatar Jan 17 '24 02:01 Angelelz

Don't let drizzle be like prisma https://github.com/prisma/prisma/issues/8950

Nedi11 avatar Feb 11 '24 00:02 Nedi11

Would this also include full text search in sqlite?

kelbyfaessler avatar Apr 06 '24 18:04 kelbyfaessler

yes, finishing generated columns and them adding full-text search

AndriiSherman avatar Apr 07 '24 18:04 AndriiSherman

actually sqlite - will need to think. I guess we would need to support virtual tables first, but need to investigate a bit

AndriiSherman avatar Apr 07 '24 18:04 AndriiSherman

Latest release (v0.31.0) has GIN indexes working. Thanks to Mykhailo for pointing it out to me

https://github.com/drizzle-team/drizzle-orm/releases/tag/0.31.0

ginIndex1: index('gin_idx_1').using('gin', t.metadata),
ginIndex2: index('gin_idx_2').using('gin', sql`metadata->'name'`),

Gives me

CREATE INDEX IF NOT EXISTS "gin_idx_1" ON "users" USING gin (metadata);--> statement-breakpoint
CREATE INDEX IF NOT EXISTS "gin_idx_2" ON "users" USING gin (metadata->'name');

Where metadata is a jsonb column

paulwongx avatar Jun 03 '24 22:06 paulwongx

Should work well in 0.31.0 release, also here is a guide with examples: https://orm.drizzle.team/learn/guides/postgresql-full-text-search

We are working on adding proper helper functions and release 1 patch update today in [email protected] with a proper index generation for tsvector

After we will have generated columns support(in the next big release), we will add another guide for using tsvector with a generated columns in Postgres

AndriiSherman avatar Jun 04 '24 07:06 AndriiSherman