torchtitan
torchtitan copied to clipboard
Process got stuck when trying to optimize different groups of parameters using different types of data
Hi,
I'm adding a new linear projection layer (nn.Linear) to the original Llama3 architecture to process a new type of data. During training, I use two types of data (language-only and multimodal data). When using language-only data, the whole Llama-3 parameters will be finetuned. When using multimodal data, the whole Llama-3 parameters and the parameters in the added linear layer will be finetuned. Both of them can function well independently.
However, when I combined these two types of data to do multi-task learning, the process just got stuck without any further information. Doesn't the current torchtitan support this kind of function? Thanks.
### Tasks