Aman Karmani
Aman Karmani
ah i think you need to call `replace...` method to monkey patch *before* the model is instantiated, i.e. before `AutoModelForCausalLM.from_pretrained`
try this? https://github.com/artidoro/qlora/commit/1b5641913914a48ad15eadb96a0dba6452aa0ac1
Try this: > pip install -U wheel
> Try this: > > > pip install -U wheel someone reported this worked in their environment. but when we tried in a fresh docker/conda env, its not working. nor...
Use: > pip install -U flash-attn --no-build-isolation
The issue here is that once you add a pyproject.toml, pip will use that and use build isolation. To make isolation work, we would need to add to the toml:...
fixed by https://github.com/Dao-AILab/flash-attention/commit/73bd3f3bbb6775c5286e4b095efbc62d9fd4e5dd
The fix is out. Try: `pip install -v flash-attn==2.1.1`
@tridao this issue can be closed. If you want to give me issue maint privileges, I can help out keeping things tidy.
Weird, can you run with `-v` and post the output