Void Main comments

Results 41 comments of


                                            Void Main

[Question] Why DiT-XL/2 takes 119 GFlops to generate 256x256 images?

Thanks @ictzyqq , it indeed solved my question. In short, the paper counts flops as MACs, so I should remove the 2 (multiply and addition) from `seq_len * 2 *...

LLaMA support

Hey @michaelroyzen @cameronfr @Anychnn @jinluyang , I got a self-tested working version and opened a pull request with it. Could you guys please take a look? Any chances we could...

LLaMA support

Hey community, here are some updates: - supported bf16 - supported triton decouple mode - verified that Llama 65B is working

LLaMA support

> Hey, a tutorial on how to run LLaMA with the FasterTransformer backend would be really helpful! would be happy to contribute sure, will provide a step by step tutorial...

[enhancement] support llama

> This implement for llama is very meaningful and do you test the performance of this ? How fast can this be when compares with vanilla transformers api? I've been...

[enhancement] support llama

Some updates: - supported bf16 - supported triton decouple mode - verified that Llama 65B is working

[enhancement] support llama

Hi @byshiue , now that we have made much progress and verified on many models, I wonder if it is possible to get this PR reviewed / merged?

[enhancement] support llama

> I noticed that llama_example.cpp generates correct outputs in FP16, while triton does not. Does anyone know why? @michaelroyzen looks like the root cause is what @yinghai pointed out. merge...

[enhancement] support llama

> @void-main Hi, I'm also in Beijing and I'm a developer in AI inference. could I have your wechat? sure, try send me an email. :-)

[enhancement] support llama

Hi @CN-COTER , thanks for the contribution! really appreciate it! I've checked your code and started a review, could you please take a look. 🎉