style-transfer-paraphrase
style-transfer-paraphrase copied to clipboard
Add example scripts for training CDS models
Additionally, add dict.txt
files for these dataset folders
Hello @martiansideofthemoon
I hope you are fine and doing great. I am trying to set up the custom dataset to use in training. I have converted the text file into bpe format but now facing a dict.txt not founding error. I have attached the screenshot as well. Please have a look and let me know. Thank You.
Hi @TufailAhmadSiddiq , could you share output of ls datasets/new_dataset/*/*
?
Hi, I have executed that command, and here is the result
Please follow instructions here, especially the first paragraph: https://github.com/martiansideofthemoon/style-transfer-paraphrase#custom-datasets
You need .txt
, .label
files to get it started. The first script will create the input0.bpe
files for you.
I have created .txt and .label files for training, validation, and testing and placed them inside the
new_dataset. Here is the screenshot
Hi, @martiansideofthemoon I have sent you the screenshot of my directory. Can you please tell me why this problem is occurring?
What's the error you get with this directory in place?
The following error is occurring
However the file dict.txt is there in datasets/new_dataset-bin.
I'm suspecting this error is coming from fairseq
preprocessing. I think it creates the dict files for you (the entire bin
folder in fact). Maybe try to run the code by temporarily renaming the bin folder to something else?
The error is still intact. However, it creates a folder named "new_dataset-bin". I am attaching a screen shot of what is inside dataset/new_dataset-bin folder below
There are two folders and one file. input0 folder is empty however label folder has following
I am also attaching screen shot of what I have in dict.txt below
I have some articles on which I am trying to fine tune this model so that the model can learn the writing style used in my articles. I gave this style the name of "custom_style". 15720 represents the entries in my train set. I think these files seem fine. So my question is can I proceed to fine tuning step with input0 having nothing in it?