style-transfer-paraphrase icon indicating copy to clipboard operation
style-transfer-paraphrase copied to clipboard

Add example scripts for training CDS models

Open martiansideofthemoon opened this issue 3 years ago • 11 comments

Additionally, add dict.txt files for these dataset folders

martiansideofthemoon avatar Oct 27 '21 13:10 martiansideofthemoon

Hello @martiansideofthemoon I hope you are fine and doing great. I am trying to set up the custom dataset to use in training. I have converted the text file into bpe format but now facing a dict.txt not founding error. I have attached the screenshot as well. Please have a look and let me know. Thank You. dict txt file not found

TufailAhmadSiddiq avatar Feb 08 '23 18:02 TufailAhmadSiddiq

Hi @TufailAhmadSiddiq , could you share output of ls datasets/new_dataset/*/*?

martiansideofthemoon avatar Feb 08 '23 18:02 martiansideofthemoon

Hi, I have executed that command, and here is the result ls datasets new_dataset

TufailAhmadSiddiq avatar Feb 08 '23 19:02 TufailAhmadSiddiq

Please follow instructions here, especially the first paragraph: https://github.com/martiansideofthemoon/style-transfer-paraphrase#custom-datasets

You need .txt, .label files to get it started. The first script will create the input0.bpe files for you.

martiansideofthemoon avatar Feb 08 '23 19:02 martiansideofthemoon

I have created .txt and .label files for training, validation, and testing and placed them inside the Uploading inside new_dataset.PNG… new_dataset. Here is the screenshot

TufailAhmadSiddiq avatar Feb 08 '23 19:02 TufailAhmadSiddiq

inside new_dataset

TufailAhmadSiddiq avatar Feb 08 '23 19:02 TufailAhmadSiddiq

Hi, @martiansideofthemoon I have sent you the screenshot of my directory. Can you please tell me why this problem is occurring?

TufailAhmadSiddiq avatar Feb 09 '23 16:02 TufailAhmadSiddiq

What's the error you get with this directory in place?

martiansideofthemoon avatar Feb 09 '23 16:02 martiansideofthemoon

The following error is occurring image However the file dict.txt is there in datasets/new_dataset-bin.

HassanBinAli avatar Feb 09 '23 17:02 HassanBinAli

I'm suspecting this error is coming from fairseq preprocessing. I think it creates the dict files for you (the entire bin folder in fact). Maybe try to run the code by temporarily renaming the bin folder to something else?

martiansideofthemoon avatar Feb 09 '23 19:02 martiansideofthemoon

The error is still intact. However, it creates a folder named "new_dataset-bin". I am attaching a screen shot of what is inside dataset/new_dataset-bin folder below image There are two folders and one file. input0 folder is empty however label folder has following image I am also attaching screen shot of what I have in dict.txt below image I have some articles on which I am trying to fine tune this model so that the model can learn the writing style used in my articles. I gave this style the name of "custom_style". 15720 represents the entries in my train set. I think these files seem fine. So my question is can I proceed to fine tuning step with input0 having nothing in it?

HassanBinAli avatar Feb 11 '23 16:02 HassanBinAli