drizzle-orm
drizzle-orm copied to clipboard
Add Full-Text Search capability
This is a feature request for Full-Text Search functionality to perform a fulltext search on specific fields. Ideally with support for using GIN/GIST indexes to speed up full text search.
Could you show how it looks like in SQL?
CREATE TABLE companies (
id serial PRIMARY KEY,
name VARCHAR ( 50 ) NOT NULL,
description VARCHAR ( 255 ),
);
Using the "tsvector" column type for Postgres will speed up full text search.
Example using a precomputed tsvector:
ALTER TABLE "companies" ADD COLUMN "name_vec" tsvector;
UPDATE companies SET name_vec = to_tsvector("english", name);
CREATE INDEX companies_name on companies USING GIN (name_vec);
select * from companies where name @@ to_tsquery('english', 'amazo:*')
The full range of Postgresql Text Search Functions and Operators
Are there useful workarounds using the querybuilder today?
Started look at this a bit - at least for generating a tsvector column in a schema definition, doing something very similar to the citext example of an extended column type worked fine: https://gist.github.com/tmcw/9f0327c74e914524eed997131b96e66e
@tmcw is there any way to also support generated
specifier?
for example, I want to use drizzle to define this column:
my_column_name tsvector generated always as (to_tsvector('english', other_column_name)) stored
edit:
based on @tmcw i'm able to use custom config inside tsvector
function, here is the implementation:
import { ColumnBuilderConfig, ColumnConfig } from "drizzle-orm";
import {
AnyPgTable,
PgColumn,
PgColumnBuilder,
PgColumnBuilderHKT,
PgColumnHKT,
} from "drizzle-orm/pg-core";
export interface PgTSVectorConfig {
sources?: string[];
}
export class PgTSVectorBuilder<
TData extends string = string
> extends PgColumnBuilder<
PgColumnBuilderHKT,
ColumnBuilderConfig<{ data: TData; driverParam: string }>,
{ sources: string[] | undefined }
> {
protected $pgColumnBuilderBrand: string = "PgTSVectorBuilder";
constructor(name: string, config: PgTSVectorConfig) {
super(name);
this.config.sources = config.sources;
}
build<TTableName extends string>(
table: AnyPgTable<{ name: TTableName }>
): PgTSVector<TTableName, TData> {
return new PgTSVector(table, this.config);
}
}
export class PgTSVector<
TTableName extends string,
TData extends string
> extends PgColumn<
PgColumnHKT,
ColumnConfig<{ tableName: TTableName; data: TData; driverParam: string }>,
{ sources: string[] | undefined }
> {
constructor(
table: AnyPgTable<{ name: TTableName }>,
builder: PgTSVectorBuilder<TData>["config"]
) {
super(table, builder);
}
getSQLType(): string {
return this.config.sources === undefined
? `tsvector`
: `tsvector generated always as (to_tsvector('english', ${this.config.sources.join(
" || ' ' || "
)})) stored`;
}
}
export function tsvector<TName extends string>(
name: string,
config: PgTSVectorConfig = {}
): PgTSVectorBuilder<TName> {
return new PgTSVectorBuilder(name, config);
}
Wow, that is the first I'm learning of the generated specifier 😆. So thanks for the tip, but yeah - not sure how to support that so far!
Building upon above answer I added support for weight
import { ColumnBuilderConfig, ColumnConfig } from "drizzle-orm";
import {
AnyPgTable,
PgColumn,
PgColumnBuilder,
PgColumnBuilderHKT,
PgColumnHKT,
} from "drizzle-orm/pg-core";
type PgTSVectorBuilderConfig = {
sources: string[] | undefined;
weighted?: boolean;
};
export interface PgTSVectorConfig {
sources?: PgTSVectorBuilderConfig["sources"];
weighted?: PgTSVectorBuilderConfig["weighted"];
}
function generateTsvectorColumn(input: string[]) {
const columnExpressions = input.map((column, index) => {
const weight = String.fromCharCode(index + 65);
return `setweight(to_tsvector('english', coalesce(${column}, '')), '${weight}')`;
});
const tsvectorColumn = `tsvector GENERATED ALWAYS AS (${columnExpressions.join(
" || "
)}) STORED`;
return tsvectorColumn;
}
export class PgTSVectorBuilder<
TData extends string = string
> extends PgColumnBuilder<
PgColumnBuilderHKT,
ColumnBuilderConfig<{ data: TData; driverParam: string }>,
PgTSVectorBuilderConfig
> {
protected $pgColumnBuilderBrand: string = "PgTSVectorBuilder";
constructor(name: string, config: PgTSVectorConfig) {
super(name);
this.config.sources = config.sources;
this.config.weighted = config.weighted;
}
build<TTableName extends string>(
table: AnyPgTable<{ name: TTableName }>
): PgTSVector<TTableName, TData> {
const xyz = new PgTSVector(table, this.config);
console.log(xyz);
return xyz;
}
}
export class PgTSVector<
TTableName extends string,
TData extends string
> extends PgColumn<
PgColumnHKT,
ColumnConfig<{ tableName: TTableName; data: TData; driverParam: string }>,
{ sources: string[] | undefined; weighted?: boolean }
> {
constructor(
table: AnyPgTable<{ name: TTableName }>,
builder: PgTSVectorBuilder<TData>["config"]
) {
super(table, builder);
}
getSQLType(): string {
return this.config.sources === undefined
? `tsvector`
: this.config.weighted
? generateTsvectorColumn(this.config.sources)
: `tsvector generated always as (to_tsvector('english', ${this.config.sources.join(
" || ' ' || "
)})) stored`;
}
}
export function tsvector<TName extends string>(
name: string,
config: PgTSVectorConfig = {}
): PgTSVectorBuilder<TName> {
return new PgTSVectorBuilder(name, config);
}
In your schema
...
vec: tsvector("info", {
sources: ["title", "description"],
weighted: true,
}),
...
The only issue I'm getting is in the migration file. Double inverted commas before tsvector and at the end of the sql statement as well
Current output:
ALTER TABLE "table_name" ADD COLUMN "vec" "tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B')) STORED";
Expected output:
ALTER TABLE "table_name" ADD COLUMN "vec" tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B')) STORED;
Any chance anyone's actively looking at adding indexed (GIN or GIST)generated ts_vector columns? This is one of the last things that will let us move away from Supabase and over to Drizzle entirely. Hoping eventually it can also be added to Drizzle-Kit's introspect command so we can grab our existing search columns.
+1 GIN support. This would be absolutely killer.
+1
This should have a PostgreSQL label. tsvector is only available in PostgreSQL. MySql is different and there is already a PR open for it.
+1
+1
For drizzle-orm 0.28.5 or later:
// tsvector.ts
import { customType } from "drizzle-orm/pg-core";
export const tsvector = customType<{
data: string;
config: { sources: string[] };
}>({
dataType(config) {
if (config) {
const sources = config.sources.join(" || ' ' || ");
return `tsvector generated always as (to_tsvector('english', ${sources})) stored`;
} else {
return `tsvector`;
}
},
});
example usage:
// Search field
fts: tsvector("fts", {
sources: ["id", "twitter_username", "twitter_name"], // list of column names
}),
import { customType } from "drizzle-orm/pg-core"; export const tsvector = customType<{ data: string; config: { sources: string[] }; }>({ dataType(config) { if (config) { const sources = config.sources.join(" || ' ' || "); return `tsvector generated always as (to_tsvector('english', ${sources})) stored`; } else { return `tsvector`; } }, });
Thanks for the updated code. In case you need to adjust weights as well
import { customType } from "drizzle-orm/pg-core";
function genExpWithWeights(input: string[]) {
const columnExpressions = input.map((column, index) => {
const weight = String.fromCharCode(index + 65);
return `setweight(to_tsvector('english', coalesce(${column}, '')), '${weight}')`;
});
const tsvectorColumn = `tsvector GENERATED ALWAYS AS (${columnExpressions.join(
" || "
)}) STORED`;
return tsvectorColumn;
}
export const tsvector = customType<{
data: string;
config: { sources: string[]; weighted: boolean };
}>({
dataType(config) {
if (config) {
const sources = config.sources.join(" || ' ' || ");
return config.weighted
? genExpWithWeights(config.sources)
: `tsvector generated always as (to_tsvector('english', ${sources})) stored`;
} else {
return `tsvector`;
}
},
});
import { customType } from "drizzle-orm/pg-core"; export const tsvector = customType<{ data: string; config: { sources: string[] }; }>({ dataType(config) { if (config) { const sources = config.sources.join(" || ' ' || "); return `tsvector generated always as (to_tsvector('english', ${sources})) stored`; } else { return `tsvector`; } }, });
Thanks for the updated code. In case you need to adjust weights as well
import { customType } from "drizzle-orm/pg-core"; function genExpWithWeights(input: string[]) { const columnExpressions = input.map((column, index) => { const weight = String.fromCharCode(index + 65); return `setweight(to_tsvector('english', coalesce(${column}, '')), '${weight}')`; }); const tsvectorColumn = `tsvector GENERATED ALWAYS AS (${columnExpressions.join( " || " )}) STORED`; return tsvectorColumn; } export const tsvector = customType<{ data: string; config: { sources: string[]; weighted: boolean }; }>({ dataType(config) { if (config) { const sources = config.sources.join(" || ' ' || "); return config.weighted ? genExpWithWeights(config.sources) : `tsvector generated always as (to_tsvector('english', ${sources})) stored`; } else { return `tsvector`; } }, });
I just tried this definition it doesn't work.
(error: type "tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', " does not exist)
Here is the sample column definition generated by the type:
--> statement-breakpoint
CREATE TABLE IF NOT EXISTS "post" (
-- ...
"fts_doc_mk" "tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B') || setweight(to_tsvector('english', coalesce(overview, '')), 'C')) STORED",
-- ...
)
The problem is that the column definition is wrapped with double quotes. 🤔
You're supposed to use the sql
operator to generate sql and not escape it:
export const tsvector = customType<{
data: string;
config: { sources: string[]; weighted: boolean };
}>({
dataType(config) {
if (config) {
const sources = config.sources.join(" || ' ' || ");
return config.weighted
? genExpWithWeights(config.sources)
: sql`tsvector generated always as (to_tsvector('english', ${sources})) stored`;
} else {
return sql`tsvector`;
}
},
});
You should do the same inside the function genExpWithWeights
But this should poiint you in the right direction.
const sources = config.sources.join(" || ' ' || ");
Sadly that doesn't work on top of ts complaining with:
Type '({ columns, locale }: Config) => SQL<unknown>' is not assignable to type '(config: Config) => string'.
Type 'SQL<unknown>' is not assignable to type 'string'.ts(2322)
I've tried playing around with setting sql<string>"tsvector GENERATED ALWAYS... " (or changing the
data: SQLbut drizzle-kit simply ignores generating the schema even with the
// @ts-ignore`.
EDIT: I tried the following thing it still doesn't work (the column definition is all quoted):
export const tsVector = customType<{
data: string;
config: Config;
configRequired: true;
}>({
dataType({ columns, locale }) {
const language = resolveLocaleToLanguage(locale);
const columnExpressions = columns.map((column, index) => {
const weight = String.fromCharCode(index + 65);
return `setweight(to_tsvector('${language}', coalesce(${column}, '')), '${weight}')`;
});
const tsvectorColumn = `tsvector GENERATED ALWAYS AS (${columnExpressions.join(
' || '
)}) STORED`;
return tsvectorColumn;
},
toDriver(value: string) {
return sql`${value}`;
},
fromDriver(value: unknown) {
return value as string;
}
});
Any update from this? Could anyone make it work?
Nothing new from my side. I resorted to using a custom migration script just for this field - and abandoning db push
for prototyping which is pity.
The actual problem is that the dataType
function only accepts a string as the return type, no magic sql operator supported. Causing everything to be wrapped in double quotes. Currently was able to workaround by just changing the sql output manually, interestingly it works normally for subsequent generations.
@Angelelz any update on the above? The sql
magic operator doesn't work when returned from dataType
as your example shows.
Error:
Type '({ columns, locale }: Config) => SQL<unknown>' is not assignable to type '(config: Config) => string'.
Type 'SQL<unknown>' is not assignable to type 'string'.ts(2322)
Yes, my solution was wrong. Unfortunately, the workaround is letting drizzle-kit generate the migration and then manually removing the double quotes around the type:
--> statement-breakpoint
CREATE TABLE IF NOT EXISTS "post" (
-- ...
"fts_doc_mk" tsvector GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B') || setweight(to_tsvector('english', coalesce(overview, '')), 'C')) STORED,
-- ...
)
Don't let drizzle be like prisma https://github.com/prisma/prisma/issues/8950
Would this also include full text search in sqlite?
yes, finishing generated columns and them adding full-text search
actually sqlite - will need to think. I guess we would need to support virtual tables first, but need to investigate a bit
Latest release (v0.31.0) has GIN indexes working. Thanks to Mykhailo for pointing it out to me
https://github.com/drizzle-team/drizzle-orm/releases/tag/0.31.0
ginIndex1: index('gin_idx_1').using('gin', t.metadata),
ginIndex2: index('gin_idx_2').using('gin', sql`metadata->'name'`),
Gives me
CREATE INDEX IF NOT EXISTS "gin_idx_1" ON "users" USING gin (metadata);--> statement-breakpoint
CREATE INDEX IF NOT EXISTS "gin_idx_2" ON "users" USING gin (metadata->'name');
Where metadata is a jsonb
column
Should work well in 0.31.0
release, also here is a guide with examples: https://orm.drizzle.team/learn/guides/postgresql-full-text-search
We are working on adding proper helper functions and release 1 patch update today in [email protected]
with a proper index generation for tsvector
After we will have generated columns support(in the next big release), we will add another guide for using tsvector with a generated columns in Postgres