spring-ai
spring-ai copied to clipboard
Refactor PgVercorStore filter template to use JSONB field access
Changed the filter template in PgVercorStore to replace JSONPath expressions with JSONB field access, using standard SQL operators such as '=' instead of '==', 'AND' instead of '&&', and 'OR' instead of '||'. This adjustment addresses compatibility issues with the 'IN' operator, which previously returned parse errors. Also updated PgVectorFilterExpressionConverter and corresponding tests to align with these changes.
- Replaced
metadata::jsonb @@ '$.key == "value"'::jsonpathwithmetadata::jsonb->>'key' = 'value' - Fixed parsing issues with
metadata::jsonb @@ '$.key in ["value"]'::jsonpathby using JSONB field access. - Updated logical operators IN filter expressions to standard SQL syntax.
These changes improve the clarity, execution reliability, and compatibility of filter expressions in the database.
Thank you @dperezcabrera, looks like a nice improvement!
Could you please pull rebase to the upstream main (e.g. git pull -r upstream main), resolve any eventual conflicts and (force) merge to you PR?
Thank you. I've rebased and merged the changes as requested!
@dperezcabrera have you run the integration tests for the pgvector and for the autoconfiguraiton?
the ./mvnw clean install -pl vector-stores/spring-ai-pgvector-store -Pintegration-tests is failing form me
I have executed them; however, since I don't have an API key for OpenAI, this has been the result: run: 4, Failures: 0, Errors: 0, Skipped: 4.
Can you please send me the error logs by email?
@dperezcabrera for example
- searchWithFilters: COSINE_DISTANCE
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, embedding [=] ? AS distance FROM vector_store WHERE embedding [=] ? [ ? AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020) ORDER BY distance LIMIT ? ]
- searchWithFilters: EUCLIDEAN_DISTANCE
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, embedding [-] ? AS distance FROM vector_store WHERE embedding [-] ? [ ? AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020) ORDER BY distance LIMIT ? ]
- searchWithFilters: NEGATIVE_INNER_PRODUCT
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, (1 + (embedding [#] ?)) AS distance FROM vector_store WHERE (1 + (embedding [#] ?)) [ ? AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020) ORDER BY distance LIMIT ? ]
I see another problem, metadata::jsonb->>'year' must be (metadata->>'year')::INTEGER
In protected void doKey(Key key, StringBuilder context) { I can't do this change.
Hi, this is a great contribution to improve the speed of pgvector, you mention In protected void doKey(Key key, StringBuilder context) { I can't do this change. what do you mean?
I can't easily share access keys, I would recommend getting an openai subscription, the cost is minimal.
Here: (https://github.com/dperezcabrera/spring-ai/blob/bebbfceab6e22a6bdeda3c3e168aceaf52994abd/spring-ai-core/src/main/java/org/springframework/ai/vectorstore/filter/converter/PgVectorFilterExpressionConverter.java#L64C1-L76C1)
I need to know the type of the key. If fieldname is an integer, the method must use this template (metadata->>'fieldname')::INTEGER. If fieldname is a string, then: metadata::jsonb->>'fieldname'. If fieldname is a date, (metadata->>'fieldname')::DATE. This ensures the key is correctly interpreted in the database query. This will require a significant change in the common classes and may have side effects on other implementations, which need to be carefully considered before proposing it.
Now I have and access key, before I only had API keys for Microsoft's Azure, and they didn't work with the integrated tests.
bumping to RC1, apologies
I think it is best to close this pull request, and address this functionality in the future. If you agree, close it.
Thank you for the effort @dperezcabrera. We can re-open it in the future if you have time to work on it.
Causing due to some unsolved issues (check the log above).