neurallambda
neurallambda
Is this ticket dead because some other technique exists already for returning and reusing `past_key_values`? This is a killer feature.
The following PR is more up to date: https://github.com/huggingface/transformers/pull/25086
Same for Llama 2? (I can't seem to find an answer)
It works for me if I just set the env vars before the import of bark: ```python import os os.environ["SUNO_USE_SMALL_MODELS"] = "True" os.environ["SUNO_OFFLOAD_CPU"] = "True" import bark ``` it appears...
So, it doesn't seem like much is technically required to implement this, but it is kinda hacky right now, at least when I try with a `mixtral gptq`. If I...
This just landed: https://github.com/showlab/MotionDirector But we also really need the data pipeline stuff: https://github.com/Stability-AI/generative-models/issues/213
Thanks for the reply, and woh, just any pytorch training setup will do? I'm just interested in next-token prediction. Does it get along with, say, the `accelerate` ecosystem for multi-node/multi-gpu?...
Geez open source is fast, here's a chattified version with simple example: https://github.com/havenhq/mamba-chat/blob/main/train_mamba.py
This is not clear from the docs. So you're saying it would make sense for a 4bit GPTQ Mistral 7B to take up >40GB VRAM if available, but that's not...
"code" would be a useful language to add, especially common languages like python and javascript. The GTE project claims this ability: https://huggingface.co/thenlper/gte-large