ebsmothers
ebsmothers
Hi, thanks for your CoCa implementation! I have a question on the multimodal transformer: typically in a decoder layer I would expect to see self-attention, then cross-attention, then an MLP....
Creating this issue to track gaps in our current testing. ### Tests to write - [ ] Add gradient accumulation test for LoRA recipe (ideally also testing LR scheduler) -...
Main changes: log on every step, accumulate metrics correctly over iterations, scrap log_memory_stats_every_n_steps and consolidate with existing log_every_n_steps. Still need to test I didn't break anything. If we like this...
Updating our default PR template to hopefully make it a bit easier and clearer on testing/sanity checks to run when opening a PR. New template pasted below:
There are some gotchas around usage of the base llama3 fine-tuned models with respect to special tokens. While we should smooth these out and make it easy to use for...
Based on https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py#L91-L94 and https://github.com/meta-llama/llama3/blob/main/llama/generation.py#L197 we should support stopping on more than one token during generation. This PR adds this field to our tokenizers and integrates it into the generation...
Right now we have e.g. `tests/torchtune/models/test_llama3.py` and `tests/torchtune/models/test_lora_llama2.py` test files for our models. This is not in line with our implementation files, which have directories `torchtune/models/llama2`, `torchtune/models/mistral`, etc. We should...
This is a PR for integration with PEFT to allow continued fine-tuning of checkpoints from torchtune. We save a file `adapter_config.json`, along with `adapter_model.bin` to match the format expected by...
### Describe the bug Hi there, thanks for the great library! We have been using it a lot in torchtune and it's been a huge help for us. Regarding the...
### Context Based on #1001, it's clear that our generation recipe is not the most flexible when it comes to different tokenizers/formats/templates we support. Right now this stuff is inherently...