Mateusz Piotrowski

Results 4 comments of Mateusz Piotrowski

I think the problem you're seeing is caused by the prompt formatting and not the implementation differences. I compared the TL to HF model and while there are some small...

Thanks for clarifying! From the issue description, I assumed that the problem is the generated tokens being in Chinese and that behavior is the same for the HF implementation for...

@yeutong the issue is caused by a different attention scale used (~14.96 vs 16). The HF implementation also disables the attention logits soft capping for inference, but that is less...

@microsoft-github-policy-service agree company="Anthropic"