lance icon indicating copy to clipboard operation
lance copied to clipboard

friendlier error messages in nearest API

Open changhiskhan opened this issue 2 years ago • 1 comments

Currently in the nearest parameter dict:

  1. if the column exists but is not a list column say, then tokio panics and the error looks very scary
  2. if the q is a list/array but the dimensionality doesn't match the vectors, also a tokio panic scary looking error

These are actually schema errors so should just be caught in python

changhiskhan avatar Feb 02 '23 06:02 changhiskhan

2 is tricky since pyarrow defaults to using ListType instead of the FixedSizeListType for vector columns.

Is there another way besides say, sampling the first 10 values in the embeddings column and checking they have the same dimension as q?

ananis25 avatar Feb 19 '23 18:02 ananis25

import lance import numpy as np import pandas as pd import pyarrow as pa import pyarrow.dataset

df = pd.DataFrame({"a": [5], "b": [10]}) tbl = pa.Table.from_pandas(df) ds = lance.write_dataset(tbl, "/tmp/test.lance") ds.to_table(nearest={'column': 'a', 'q': np.random.randn(128), 'k':10})

ValueError: LanceError(IO): KNNFlatExec node: query column a is not a vector

changhiskhan avatar Jul 22 '23 10:07 changhiskhan

Resolved with #1336.

rok avatar Oct 11 '23 10:10 rok