lance icon indicating copy to clipboard operation
lance copied to clipboard

Segfault when using f32

Open cemoody opened this issue 2 years ago • 3 comments

When using a query vector that is 32bit I get a segfault in the following snippet:

    ds = dataset.to_table(
        nearest={
            "column": "vector",
            "q": query,
            "k": k,
            "nprobes": nprobe,
            "refine_factor": refine_factor,
        }
    )

Casting query = query.astype(np.float64) avoids the segfault. Probably worth implicitly casting or issuing a warning.

cemoody avatar Feb 19 '23 05:02 cemoody

Hi @cemoody , what is the data type of the vector column? float32 or float64?

eddyxu avatar Feb 19 '23 05:02 eddyxu

The vector in the Lance file is f32. But the query must be cast to f64 or I'll get segfaults

cemoody avatar Feb 19 '23 14:02 cemoody

after the cast, what are the scores? the l2 distance scores we're computing is actually l2 distance squared (slightly faster and doesn't affect order), but it's possible the score overflows f32 ?

changhiskhan avatar Feb 20 '23 06:02 changhiskhan