Carlos Mocholí
Carlos Mocholí
Argh good point. This needs to be solved at the CLI level, but I'm not sure of the best way to do it. Opened https://github.com/omni-us/jsonargparse/issues/479 to ask.
Isn't this a duplicate of https://github.com/Lightning-AI/litgpt/issues/1084 of yours?
Note that this needs to be done by having two different base classes and having the files use only one of them in their type signatures
This is feasible now that we get the list of bins or safetensors before running the download: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/scripts/download.py#L54 What would you do if?: - The bins/safetensors exists but the lit_model.pth...
Sounds good to me!
Hi @FlimFlamm! Thanks for working on this. I had this partially implemented but never pushed it and I might have lost it because I cannot find it in my stashes...
This would require adding support for ALiBi. This is not a priority at the moment. A related feature request is https://github.com/Lightning-AI/lit-gpt/issues/199
We don't support flash attention from `flash-attn`. Supporting alibi would warrant an entirely new `model_alibi.py` definition
We support flash attention via PyTorch's `scaled_dot_product_attention`, just not via Tri Dao's `flash-attn`. The former uses one of the latter's implementations internally.
Two more which probably need to be fixed in PyTorch ```python /home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py:1132: UserWarning: Please use DTensor instead and we are deprecating ShardedTensor. warnings.warn(DEPRECATE_MSG) ``` From (`print_stack` added by me): ```python...