lance icon indicating copy to clipboard operation
lance copied to clipboard

Panic when creating empty table

Open McPatate opened this issue 1 year ago • 3 comments

Hello!

I've been running into some issues while creating a table with an empty batch record.

            let schema = Arc::new(Schema::new(vec![
                Field::new(
                    "vector",
                    DataType::FixedSizeList(
                        Arc::new(Field::new("item", DataType::Float32, true)),
                        768,
                    ),
                    false,
                ),
                Field::new("content", DataType::Utf8, false),
                Field::new("file_url", DataType::Utf8, false),
                Field::new("start_line_no", DataType::UInt32, false),
                Field::new("end_line_no", DataType::UInt32, false),
            ]));
            let batch = RecordBatch::try_new(
                schema.clone(),
                vec![
                    Arc::new(FixedSizeListBuilder::new(Float32Builder::new(), 768).finish()),
                    Arc::new(StringArray::from(Vec::<&str>::new())),
                    Arc::new(StringArray::from(Vec::<&str>::new())),
                    Arc::new(UInt32Array::from(Vec::<u32>::new())),
                    Arc::new(UInt32Array::from(Vec::<u32>::new())),
                ],
            )
            .expect("failure while defining schema");
            let tbl = db
                .create_table(
                    "code-slices",
                    Box::new(RecordBatchIterator::new(vec![batch].into_iter().map(Ok), schema)),
                    None,
                )
                .await
                .expect("failed to create table");
            tbl.create_index(&["vector"])
                .ivf_pq()
                .num_partitions(256)
                .build()
                .await
                .expect("failed to create index");

The statistics collector finish() method panics here when attempting to create a StructArray because "Found unmasked nulls for non-nullable StructArray field min_value". If I instead remove batch and pass an empty vec![] to RecordBatchIterator::new, I get an error with the index creation that says it "can not train 256 centroids with 0 vectors".

Is there a way to initialise an empty table w/ an index with the Rust client?

Let me know if this isn't the right place to report this issue, I'll move it to the appropriate place.

McPatate avatar Feb 05 '24 08:02 McPatate

There is no way to create index without data. The index is IVF_PQ, which requires data to train a k-means clustering.

chebbyChefNEQ avatar Feb 08 '24 20:02 chebbyChefNEQ

What is your use case, i.e. how many vectors, how good of a recall number do you want to see? how frequently do you make updates? We can try to give you some suggestions

chebbyChefNEQ avatar Feb 08 '24 20:02 chebbyChefNEQ

There is no way to create index without data. The index is IVF_PQ, which requires data to train a k-means clustering.

That makes sense, I was expecting it to be an index like you have on traditional databases where you can set it for a field and it is updated as you insert data.

I'm currently adding repository context to my language server. I want to enhance the prompt I send to a model with bits of relevant context from the user's codebase. This means that there will be a first pass to initialize the database the first time a user opens a project & then on each file update (every time the user types) I'll also need to update the embeddings. Here is a link to the PR if you're curious, I've been exploring using my own very simplified vector store as well :)

McPatate avatar Feb 09 '24 08:02 McPatate