Yangyi Chen comments

Results 8 comments of


                                            Yangyi Chen

Link for Dataset and trained models is not working.

Hi Zhuosheng, Nice work! I'd like to follow this work and for a fair comparison, could you please provide some information about the train/dev/test split since I need to locate...

Fine-tuned checkpoint release?

Hi Vishaal, Thanks for your interest. I personally really wish to release all the code & pretrained models of this paper. However, this work was conducted during my internship at...

Process got stuck when trying to optimize different groups of parameters using different types of data

For some further information, I use a single node, multi-GPU distributed training. When waiting for a long time, I received the following messages: `[rank0]: return Variable._execution_engine.run_backward( # Calls into the...

Process got stuck when trying to optimize different groups of parameters using different types of data

Hi, Thanks for the follow-up question. I basically use the default setting as in the ./train_configs/llama3_8b.toml file. [training] batch_size = 1 seq_len = 8192 # 8192 # 16384 warmup_steps =...

Process got stuck when trying to optimize different groups of parameters using different types of data

Yes. It can happen (one data parallel rank uses the linear layer and the others do not). SO it seems like the current implementation doesn't support such function, right? Yes...

Process got stuck when trying to optimize different groups of parameters using different types of data

I see. Thanks for your help!

Process got stuck when trying to optimize different groups of parameters using different types of data

Just one quick question. When we run the dummy input through the added linear layer, do we need to compute the gradient for the linear layer regarding this dummy part?...

Process got stuck when trying to optimize different groups of parameters using different types of data

Thanks for the clarification!