Benjamin Anderson

Results 9 issues of Benjamin Anderson

I loved seeing the blog post with a simple, standalone implementation of many techniques used in production to speed up LLMs. Would love to see this extended to MoE like...

Block-diagonal attention allows keys/queries from within the same sequence to attend to each other and not other sequences, even when multiple sequences are packed/concatenated together. This is really useful to...

Grouped convolutions are awesome! Would love to be able to use them in MLX. :)

enhancement

Create a sentence-transformers like BERT model that can run MTEB with any BERT-based text embedding model. Direct integration with HuggingFace with no on-disk conversion step. Tests on bge-small to confirm...

Following the instructions in the blog post for assisted generation, I run into some issues. (FYI, both the longform_model and assistant_model are finetuned versions of OPT, which is the exact...

I'm noticing that it takes a really long time to start up anything with ranx in a serverless setting, e.g. Modal Labs. I tried adding a step during the build...

help wanted

BTLM is Cerebras's 3B model that matches the performance of many 7B models. Would be amazing to be able to quantize this because it would be so fast and good...

### Describe the bug Not sure if it's a bug, but the Usearch README led me to expect near-real-time clustering even for large indexes. However, I'm finding that at the...

bug
invalid