SimSIMD icon indicating copy to clipboard operation
SimSIMD copied to clipboard

Sparse Distances

Open ashvardanian opened this issue 11 months ago • 4 comments

All existing metrics imply dense vector representations. Dealing with very high-dimensional vectors, sparse representations may provide huge space-efficiency gains.

The only operation that needs to be implemented for Jaccard, Hamming, Inner Product, L2, and Cosine is a float-weighted vectorized set-intersection. We may expect the following kinds of vectors:

  • u16 - high priority
  • u32 - high priority
  • u16f16 - medium priority
  • u32f16 - medium priority
  • u32f32 - low priority?

The last may not be practically useful. AVX-512 backend (Intel Ice Lake and newer and AMD Genoa) and SVE (AWS Graviton, Nvidia Grace, Microsoft Cobalt) will see the biggest gains. Together with a serial backend, multiplied by 4-5 input types, and 5 distance functions, this may result in over 100 new kernels.

Any thoughts and recommendations? Someone else looking for this functionality?

ashvardanian avatar Mar 20 '24 20:03 ashvardanian

Thanks for this library!

Any updates to sparsity support? Similar to scipy.csr_matrix etc. for example.

ogencoglu avatar Sep 08 '24 19:09 ogencoglu

Yes, @ogencoglu, sparsity is already implemented, as in numpy.intersect1d. What kind of functionality are you looking for?

ashvardanian avatar Sep 08 '24 19:09 ashvardanian

In numpy/scipy, A @ B is much faster if A is sparse and turned into scipy sparse matrix. If both are sparse or only matrix A is sparse and B is dense. Both works.

Can SimSIMD improve such matrix multiplications. That's my use case.

ogencoglu avatar Sep 08 '24 20:09 ogencoglu

Cool! We will have a few related releases, but more likely in October/November. Can you please open a separate feature request for Sparse Matrix Multiplications?

And, as always, it helps if you can spread the word about the library - helps us prioritize features and work between different projects, @ogencoglu 🤗

ashvardanian avatar Sep 08 '24 21:09 ashvardanian