Mateusz Piotrowski comments

Repositories
Issues
Comments

Results 4 comments of


                                            Mateusz Piotrowski

[Bug Report] Qwen model implementation is too inaccurate

I think the problem you're seeing is caused by the prompt formatting and not the implementation differences. I compared the TL to HF model and while there are some small...

[Bug Report] Qwen model implementation is too inaccurate

Thanks for clarifying! From the issue description, I assumed that the problem is the generated tokens being in Chinese and that behavior is the same for the HF implementation for...

[Bug Report] Gemma-2-2b-it output logit doesn't match with huggingface

@yeutong the issue is caused by a different attention scale used (~14.96 vs 16). The HF implementation also disables the attention logits soft capping for inference, but that is less...

Fix interrupt_request to use control channel instead of shell

@microsoft-github-policy-service agree company="Anthropic"