Joe Cummings
Joe Cummings
@Nayef211 Is this still relevant? If so, might be worth cleaning up quickly and merging for the speedup.
Covered by current functionality, but might be a good idea to include in the documentation how to get a token.
@rohan-varma I could totally be missing something here, but why can't we include `embedding` in the modules to wrap within the config for Llama3, rather than tie this directly to...
@maximegmd This is awesome! Can you post some loss curves for the finetune you ran?
> Are the hyperparams similar to llama-2 instruct model's training? Otherwise, we can maybe also change some default hyperparams too? such as LR. I see its set as 2e-5 for...
Obviously, we'll need to clearly document it as it differs from distributed but this sounds good to me! Curious - why would distributed have hardcoded this as a default in...
Yes, but low in priorities.
@rohan-varma I think we can close this, no?
Thanks for reaching out @Titus-von-Koeller! We love how easily BnB unlocks our lowest memory use-cases. We'll definitely open a PR on your docs page featuring the integration. Also, if there's...
@Prakyathkantharaju Thanks for the contribution! Can you tell me more about your desire to add this integration to the torchtune library specifically? I'm not super familiar with ClearML Logger. Is...