Andriy Mulyar

Results 28 issues of Andriy Mulyar

Added support for Nomic Embed v1.5: https://blog.nomic.ai/posts/nomic-embed-matryoshka This let's you specify a vectorizer that will generate much smaller but still performant embeddings for local use.

Ignores the ID column in all to-disk downloads which caused issues on some datasets between Oct 30 and 31st.

This should be structured similarly to the existing Cohere integration. https://beta.openai.com/docs/guides/embeddings

Any updates on the timeline for distributed pretrain integration? Thanks!

The original implementation contains an excellent visualization tool. Extend it into the python interface.

The ability to switch between impurity metrics is implemented in the underlying implementation. Interface it with python.

Implement predict_prob with something like laplace smoothing. This is trivial but will support the use of the classifier directly in scikit-learns random forest which averages class predictions over-top of class...