lance icon indicating copy to clipboard operation
lance copied to clipboard

Expose index stats for debugging

Open changhiskhan opened this issue 2 years ago • 1 comments

we want to expose the vector index stats to the user (in python) for debugging purposes:

  1. index name, id, and file location
  2. index type
  3. if ivf_pq: num ivf partitions, num pq sub vectors, num pq bits
  4. if ivf_pq: for each partition, the centroid vector for the partition and the number of vectors in that partition.

on the python side, the api can look like:

dataset.list_index => [Index]

dataset.get_index("name") -> Index

Index should have:

  • index_type
  • name
  • index-id
  • uri

IvfPqIndex (subclass of Index) should have:

  • num ivf partitions
  • num pq sub vectors
  • num pq bits
  • each ivf centroid
  • num vectors in in partition

changhiskhan avatar Feb 22 '23 07:02 changhiskhan

I will work on this.

Renkai avatar Mar 15 '23 00:03 Renkai

This may also be useful for merging indices

changhiskhan avatar Jul 02 '23 22:07 changhiskhan