ThisisBillhe comments

Results 40 comments of


                                            ThisisBillhe

The effect is poor when using the demo for testing.

It works fine for me. Please make sure you use the correct t2i_XL_stage2_512.pt checkpoint of NAR rather than LlamaGen. ![Image](https://github.com/user-attachments/assets/a9cdfbe8-3d28-45c6-a87d-7b397f45e999)

abs_pe size mismatch when loading LARP_AR from hywang66/LARP-L-long-AR for video generation

Hi, thanks for your interest. Please make sure you download the correct checkpoint from [here](https://huggingface.co/collections/chenfeng1271/nar-67d13fa93fe913b2e187ee1f) rather than hywang66/LARP-L-long-AR.

Got error in ZigZagRingFlashAttnVarlenFunc

Thanks for your reply! I have tried your latest commit and sadly it did not run well in my case. The program will get stuck. I think the reason is...

Got error in ZigZagRingFlashAttnVarlenFunc

Perhaps we should send cu_seqlens_k and max_seqlen_in_batch_k along with k and v to other ranks.

int8 model is slower than fp16

Hello alumni. I think it is plausible since torch_quantizer only provides a basic framework for inferencing quantized model within pytorch. It is not well-optimized. For instance, fuse quant/dequant with gemm...

[Usage] Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint

Same here. The weights listed will be newly initialized and lead to nan, so it can't just be ignored.

Strange Results From W2A8 Model

Hi, I will look into this when I am available..working on other project now BTW, have you successfully trained w4a8 or w4a4 model with EfficientDM? Just want to make sure...

Strange Results From W2A8 Model

what about using more steps and epochs during training, e.g., 250 ddim_steps and more epochs?

Question about the implementation

This is exactly the motivation of our paper. We do not need these tokens due to the strong locality of images. As for the last token in the previous row,...

Question about the implementation

In the next-token prediction paradigm, we need to input the last token as a **query** so that we can generate the next token. As for the rest of the prior...