DreamGenX

Results 12 comments of DreamGenX

Of the recent techniques, SmoothQuant from MIT seems extremely promising for serving. It's W8A8 quant, so you don't need to dequantize during inference. This means that inference with SmoothQuant has...

My understanding is that LoRA+ and DoRA are relatively orthogonal and likely stack.

Are you thinking of supporting also the use case where you use one existing decoder-only model as the encoder, and another decoder-only model as the decoder?

The model might be able to learn to recognize the necessary patterns, such as "system message ~ starts of new example", but EOS is sometimes used inside chat templates (e.g....

@DarkLight1337 This sounds related to https://github.com/vllm-project/vllm/issues/4577 -- something between `0.4.0.post1` and `0.4.1` changed the way tokenization works. I am for whatever reason getting back a sequence of tokens like `

@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards: https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction It comes with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1 There's a...

This may explain: https://github.com/OpenAccess-AI-Collective/axolotl/issues/1100

What I meant by the RoPE comment -- and maybe this is already handled automatically -- is that if we just concatenate examples as with naive packing e.g. in HF...

This message seems to go away when using bfloat16 fwiw

You can implement this as a logit processor as far as I can tell: ```py def _get_min_p_fn(min_p: float): def _fn( logits: torch.Tensor, ) -> torch.Tensor: probs = torch.softmax(logits) top_prob =...