Said Taghadouini

Results 12 comments of Said Taghadouini

Sorry for the late response, The code is compatible with HF model implementations, so you can start from any HF BERT model, you can achieve that by defining the model...

In general I think the FA2 support on Windows is not well tested(https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features). We only ever used Linux machines for the pre-training part.