Alexander Wettig issues

Repositories
Issues
Comments

Results 3 issues of


                                            Alexander Wettig

data2vec_text: EMA teacher processes only masked tokens, instead of full sequence.

## 🐛 Bug The `target_tokens` variable in the forward call of the Data2VecTextEncoder model contains only the tokens at masked positions and padding tokens otherwise. In the method described in...

bug

needs triage

Variable memory allocation with varlen kernels

Hey! I'm a big fan of the flash attention varlen kernels, and they are fantastic for saving the memory & compute of pad tokens. When training with fixed batches of...

Replication changes sample order

## Environment - mosaicml-streaming==0.7.5 ## To reproduce Steps to reproduce the behavior: 1. Use `StreamingDataset` in distributed training with the same seed and set `replication` either to None or an...

bug