manifold icon indicating copy to clipboard operation
manifold copied to clipboard

Unrecognized type should be Object not String

Open oldshensheep opened this issue 9 months ago • 1 comments

Describe the bug

To Reproduce install pgvector for postgres https://github.com/pgvector/pgvector create table

create table if not exists document_embeddings
(
    id         serial8 primary key,
    embedding  vector(2) not null,
    content    text         not null,
    metadata   jsonb        not null,
    created_at timestamp    not null default now()
);

run code

My.DocumentEmbeddingsTest.builder("[0.2,0.3]", "abc", "{}").build();

or

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?,?,?::jsonb)".execute(
          ctx, "[0.2,0.3]", "abc", "{}");
    });
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: column "embedding" is of type vector but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.

Expected behavior There is a workaround like jsonb, We pass String and cast it to vector

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?::vector,?,?::jsonb)".execute(
          ctx, "[0.2,0.3]", "abc", "{}");
    });

or better without casting (not possible for now)

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?,?,?::jsonb)".execute(
          ctx, new float[]{0.1F, 0.2F}, "abc", "{}");
    });

use raw jdbc

      pstmt.setObject(1, new float[]{0.1F, 0.2F});
// https://github.com/pgvector/pgvector-java
      PGvector pGvector = new PGvector(new float[]{0.1F, 0.2F});
      pstmt.setObject(1, pGvector);

The Problem Converting a String to a vector works for small vectors, but often we use vectors of size 1024 or larger. Thus, we need to convert a float [1024] to a String, pass it to the database, and then the database must convert the String back to a vector, which is time-consuming.

Manually maintaining mappings from non-JDBC types to Java types can be an endless task. It's better to allow users to implement these mappings. What I propose is that an unrecognized type should default to Object, not String, as converting a String to another type can be a performance issue. Additionally, there should be a way for users to map these types themselves.

Desktop (please complete the following information):

  • OS Type & Version:
  • Java/JDK version: 21
  • IDE version (IntelliJ IDEA or Android Studio): IntelliJ IDEA
  • Manifold version: 2024.1.16
  • Manifold IntelliJ plugin version: 2024.1.4

Additional context I understand that ValueAccessor is for mapping, and I want to implement it myself.

https://github.com/manifold-systems/manifold/blob/e94f5ae8cdb87c9d901da2352a97a13a81c9f9d0/manifold-deps-parent/manifold-sql-rt/src/main/java/manifold/sql/rt/impl/accessors/OtherValueAccessor.java#L37-L40 it invoke getColumnClassName to get java type, postgresql jdbc implement this method, is this the problem of postgresql jdbc?

https://github.com/pgjdbc/pgjdbc/blob/450488c142fdc368cab54e8257407603acc18c4f/pgjdbc/src/main/java/org/postgresql/jdbc/PgResultSetMetaData.java#L440

for some reason I can't step into getColumnClassName while debugging.

Stack trace Please include a stack trace if applicable

oldshensheep avatar May 20 '24 15:05 oldshensheep