Benjamin Anderson issues

Results 9 issues of


                                            Benjamin Anderson

Expert parallelism / MoE example would be awesome :)

I loved seeing the blog post with a simple, standalone implementation of many techniques used in production to speed up LLMs. Would love to see this extended to MoE like...

[Feature Request] Support for (efficient) blockwise-diagonal attention

Block-diagonal attention allows keys/queries from within the same sequence to attend to each other and not other sequences, even when multiple sequences are packed/concatenated together. This is really useful to...

Feature Request: `groups` parameter in Conv1d

Grouped convolutions are awesome! Would love to be able to use them in MLX. :)

enhancement

Embeddings/MLX sentence transformers

Create a sentence-transformers like BERT model that can run MTEB with any BERT-based text embedding model. Direct integration with HuggingFace with no on-disk conversion step. Tests on bge-small to confirm...

Assisted generation errors due to use_cache

Following the instructions in the blog post for assisted generation, I run into some issues. (FYI, both the longform_model and assistant_model are finetuned versions of OPT, which is the exact...

JIT compilation on serverless (i.e. Modal Labs)

I'm noticing that it takes a really long time to start up anything with ranx in a serverless setting, e.g. Modal Labs. I tried adding a step during the build...

help wanted

Feature Request: Support Cerebras BTLM

BTLM is Cerebras's 3B model that matches the performance of many 7B models. Would be amazing to be able to quantize this because it would be so fast and good...

Bug: Clustering is really really slow

### Describe the bug Not sure if it's a bug, but the Usearch README led me to expect near-real-time clustering even for large indexes. However, I'm finding that at the...

bug

invalid