Joe Cummings comments

Results 278 comments of


                                            Joe Cummings

[WIP] using std::array in vocab for additional speed-ups

@Nayef211 Is this still relevant? If so, might be worth cleaning up quickly and merging for the speedup.

Helper function for Vocab object

Covered by current functionality, but might be a good idea to include in the documentation how to get a token.

FSDP Llama3 wrapping improvements for full finetune

@rohan-varma I could totally be missing something here, but why can't we include `embedding` in the modules to wrap within the config for Llama3, rather than tie this directly to...

MPS support

@maximegmd This is awesome! Can you post some loss curves for the finetune you ran?

Default to llama3-8b-instruct

> Are the hyperparams similar to llama-2 instruct model's training? Otherwise, we can maybe also change some default hyperparams too? such as LR. I see its set as 2e-5 for...

Can't run 2 finetunes at the same time

Obviously, we'll need to clearly document it as it differs from distributed but this sounds good to me! Curious - why would distributed have hardcoded this as a default in...

Remove `datasets` requirement and instead rely on download from `huggingface_hub`

Yes, but low in priorities.

Initialize kv cache w/num_kv_heads instead of num_heads

@rohan-varma I think we can close this, no?

document integration with bitsandbytes?

Thanks for reaching out @Titus-von-Koeller! We love how easily BnB unlocks our lowest memory use-cases. We'll definitely open a PR on your docs page featuring the integration. Also, if there's...

[Feature addition] Clearml logger integration

@Prakyathkantharaju Thanks for the contribution! Can you tell me more about your desire to add this integration to the torchtune library specifically? I'm not super familiar with ClearML Logger. Is...