hnswlib-node icon indicating copy to clipboard operation
hnswlib-node copied to clipboard

Returning the same points for every query?

Open siliconjungle opened this issue 1 year ago • 1 comments

Hey, I don't know if this is something I'm doing wrong, or if it's an issue with the library but thought i'd flag it / ask for help.

I've created vector embeddings for 275,000~ ish words from an english dictionary using ada-002 and i've added them to an index with the code below.

Whenever I search with it, it's always returning the same set of words regardless of what the query embedding is.

Is this a problem with the number of embeddings i'm supplying? Am I doing something else wrong?

Here is my code:

import pkg from 'hnswlib-node'

const { HierarchicalNSW } = pkg

export const createIndexCallback = async (name, dimensions, maxElements, callback) => {
  // this needs to *get the element from the callback each time*.
  const index = new HierarchicalNSW('l2', dimensions)
  index.initIndex(maxElements)

  for (let i = 0; i < maxElements; i++) {
    const embedding = await callback(i)
    index.addPoint(embedding, i)

    console.log(`Added ${i} of ${maxElements}`)
  }

  index.writeIndexSync(`${name}.dat`)

  return index
}

export const searchIndex = (name, embedding, k = 5) => {
  const index = new HierarchicalNSW('l2', embedding.length)

  index.readIndexSync(`${name}.dat`)

  const result = index.searchKnn(embedding, k)

  console.table(result)

  return result
}

siliconjungle avatar Nov 10 '23 16:11 siliconjungle

~~Perhaps it's that I used l2 rather than cosine. Since it's high dimensional space that could be detecting all of the elements as being equally far apart.~~

That made no difference.

siliconjungle avatar Nov 10 '23 19:11 siliconjungle