CodeGen
CodeGen copied to clipboard
A question about the detail of data preprocessing
Hello!
I would like to finetune the model, and during the part of data preprocessing. I saw that in line 33 of the file https://github.com/salesforce/jaxformer/blob/main/preprocess/1_split_raw.py, the code is args.data_bucket_path = '/tmp/dataset_v1/ 0_raw/train.txt'.
I would like to know what kind of data is in the file train.txt? Is all the code data to be trained put into this train.txt file?