Alexander Wettig

Results 3 issues of Alexander Wettig

## 🐛 Bug The `target_tokens` variable in the forward call of the Data2VecTextEncoder model contains only the tokens at masked positions and padding tokens otherwise. In the method described in...

bug
needs triage

Hey! I'm a big fan of the flash attention varlen kernels, and they are fantastic for saving the memory & compute of pad tokens. When training with fixed batches of...

## Environment - mosaicml-streaming==0.7.5 ## To reproduce Steps to reproduce the behavior: 1. Use `StreamingDataset` in distributed training with the same seed and set `replication` either to None or an...

bug