DreamGenX comments

Results 12 comments of


                                            DreamGenX

Quantization Support for Fastgen?

Of the recent techniques, SmoothQuant from MIT seems extremely promising for serving. It's W8A8 quant, so you don't need to dequantize during inference. This means that inference with SmoothQuant has...

Qdora：a scalable and memory-efficient method to close the gap between parameter efficient finetuning and full finetuning.

My understanding is that LoRA+ and DoRA are relatively orthogonal and likely stack.

Merging encoders

Are you thinking of supporting also the use case where you use one existing decoder-only model as the encoder, and another decoder-only model as the decoder?

[Feature Request] 10x faster training: Decontaminated Sample Packing

The model might be able to learn to recognize the necessary patterns, such as "system message ~ starts of new example", but EOS is sometimes used inside chat templates (e.g....

[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt

@DarkLight1337 This sounds related to https://github.com/vllm-project/vllm/issues/4577 -- something between `0.4.0.post1` and `0.4.1` changed the way tokenization works. I am for whatever reason getting back a sequence of tokens like `

[Feature]: Control vectors

@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards: https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction It comes with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1 There's a...

DreamGenX

Quantization Support for Fastgen?

Qdora：a scalable and memory-efficient method to close the gap between parameter efficient finetuning and full finetuning.

Merging encoders

[Feature Request] 10x faster training: Decontaminated Sample Packing

[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt

[Feature]: Control vectors

Computation of total_num_steps must include accumulation step

[FR] Sample Packing with correct attention mask

Detected layernorm nodes in FP16.

Feature Request: Add Min-P sampling layer