CodeGen icon indicating copy to clipboard operation
CodeGen copied to clipboard

A question about the detail of data preprocessing

Open zhengzzj opened this issue 1 year ago • 0 comments

Hello!

I would like to finetune the model, and during the part of data preprocessing. I saw that in line 33 of the file https://github.com/salesforce/jaxformer/blob/main/preprocess/1_split_raw.py, the code is args.data_bucket_path = '/tmp/dataset_v1/ 0_raw/train.txt'.

I would like to know what kind of data is in the file train.txt? Is all the code data to be trained put into this train.txt file?

zhengzzj avatar Jun 28 '23 08:06 zhengzzj