neurallambda comments

Results 99 comments of


                                            neurallambda

[generate] return past_key_values

Is this ticket dead because some other technique exists already for returning and reusing `past_key_values`? This is a killer feature.

[generate] return past_key_values

The following PR is more up to date: https://github.com/huggingface/transformers/pull/25086

what is the context size/context window of LLaMA?

Same for Llama 2? (I can't seem to find an answer)

CUDA out of memory Error

It works for me if I just set the env vars before the import of bark: ```python import os os.environ["SUNO_USE_SMALL_MODELS"] = "True" os.environ["SUNO_OFFLOAD_CPU"] = "True" import bark ``` it appears...

[Feature Request] GPTQ support

So, it doesn't seem like much is technically required to implement this, but it is kinda hacky right now, at least when I try with a `mixtral gptq`. If I...

[SVD] Fine-tuning instruction

This just landed: https://github.com/showlab/MotionDirector But we also really need the data pipeline stuff: https://github.com/Stability-AI/generative-models/issues/213

Would you post a minimal example of training this?

Thanks for the reply, and woh, just any pytorch training setup will do? I'm just interested in next-token prediction. Does it get along with, say, the `accelerate` ecosystem for multi-node/multi-gpu?...

Would you post a minimal example of training this?

Geez open source is fast, here's a chattified version with simple example: https://github.com/havenhq/mamba-chat/blob/main/train_mamba.py

Why vLLM - GPTQ deployment consumes high memory?

This is not clear from the docs. So you're saying it would make sense for a 4bit GPTQ Mistral 7B to take up >40GB VRAM if available, but that's not...

Multilingual Models

"code" would be a useful language to add, especially common languages like python and javascript. The GTE project claims this ability: https://huggingface.co/thenlper/gte-large