lance Panic when creating empty table

Hello!

I've been running into some issues while creating a table with an empty batch record.

            let schema = Arc::new(Schema::new(vec![
                Field::new(
                    "vector",
                    DataType::FixedSizeList(
                        Arc::new(Field::new("item", DataType::Float32, true)),
                        768,
                    ),
                    false,
                ),
                Field::new("content", DataType::Utf8, false),
                Field::new("file_url", DataType::Utf8, false),
                Field::new("start_line_no", DataType::UInt32, false),
                Field::new("end_line_no", DataType::UInt32, false),
            ]));
            let batch = RecordBatch::try_new(
                schema.clone(),
                vec![
                    Arc::new(FixedSizeListBuilder::new(Float32Builder::new(), 768).finish()),
                    Arc::new(StringArray::from(Vec::<&str>::new())),
                    Arc::new(StringArray::from(Vec::<&str>::new())),
                    Arc::new(UInt32Array::from(Vec::<u32>::new())),
                    Arc::new(UInt32Array::from(Vec::<u32>::new())),
                ],
            )
            .expect("failure while defining schema");
            let tbl = db
                .create_table(
                    "code-slices",
                    Box::new(RecordBatchIterator::new(vec![batch].into_iter().map(Ok), schema)),
                    None,
                )
                .await
                .expect("failed to create table");
            tbl.create_index(&["vector"])
                .ivf_pq()
                .num_partitions(256)
                .build()
                .await
                .expect("failed to create index");

The statistics collector finish() method panics here when attempting to create a StructArray because "Found unmasked nulls for non-nullable StructArray field min_value". If I instead remove batch and pass an empty vec![] to RecordBatchIterator::new, I get an error with the index creation that says it "can not train 256 centroids with 0 vectors".

Is there a way to initialise an empty table w/ an index with the Rust client?

Let me know if this isn't the right place to report this issue, I'll move it to the appropriate place.

Feb 05 '24 08:02 McPatate

There is no way to create index without data. The index is IVF_PQ, which requires data to train a k-means clustering.

Feb 08 '24 20:02 chebbyChefNEQ

What is your use case, i.e. how many vectors, how good of a recall number do you want to see? how frequently do you make updates? We can try to give you some suggestions

Feb 08 '24 20:02 chebbyChefNEQ

There is no way to create index without data. The index is IVF_PQ, which requires data to train a k-means clustering.

That makes sense, I was expecting it to be an index like you have on traditional databases where you can set it for a field and it is updated as you insert data.

I'm currently adding repository context to my language server. I want to enhance the prompt I send to a model with bits of relevant context from the user's codebase. This means that there will be a first pass to initialize the database the first time a user opens a project & then on each file update (every time the user types) I'll also need to update the embeddings. Here is a link to the PR if you're curious, I've been exploring using my own very simplified vector store as well :)

Feb 09 '24 08:02 McPatate

lance lance copied to clipboard

Panic when creating empty table

lance
lance copied to clipboard