EditNTS
EditNTS copied to clipboard
How to run the model on new data
Hi! I succesfully trained the model and now i would like to use it on new data, i assumed that the new data would need to be processed as well but the preprocessing expects the simplified sentences as an input, which i don't have.
How should i proceed? Is there any quick way to run the model on new data or do i need to write the preprocessing and the prediction myself?
Hi,
I think as long as you have the source and the target in txt files, you can use the data_preprocessing.py to make it into data frame and the train and run the model on new data. Just need to make sure the data frame has the specific columns ['comp_tokens', 'simp_tokens','comp_ids','simp_ids', 'comp_pos_tags', 'comp_pos_ids', edit_labels','new_edit_ids']
Sorry i think i didn't expain myself properly, i meant data of which i don't have the target simple sentences.
I have my own set of complex sentences that i wanted to try and simplify using your model
I'd also be interested in how to do this!
On Sun, 1 Mar 2020, 12:55 cesare grigoletto, [email protected] wrote:
Sorry i think i didn't expain myself properly, i meant data of which i don't have the target simple sentences.
I have my own set of complex sentences that i wanted to try and simplify using your model
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yuedongP/EditNTS/issues/4?email_source=notifications&email_token=ABQZAPIGBGXFWC4IFJYI2EDRFJLMFA5CNFSM4K3EPK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENM6K5A#issuecomment-593094004, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQZAPJUJOYELFQK6MN2UZLRFJLMFANCNFSM4K3EPK6Q .
Hello, thought I would follow up on this. @cece95 did you manage to get it working?
Hi,
The model needs supervised training. I will retrain the model and release the pre-trained model later, which can be used for other datasets without the target.
Hello. I'm interested in that model too since I'm working on simplifying text for my PhD
Hi,
The model needs supervised training. I will retrain the model and release the pre-trained model later, which can be used for other datasets without the target.
Thank you very much
Hi! I succesfully trained the model and now i would like to use it on new data, i assumed that the new data would need to be processed as well but the preprocessing expects the simplified sentences as an input, which i don't have.
How should i proceed? Is there any quick way to run the model on new data or do i need to write the preprocessing and the prediction myself?
Hello, I tried to report the work but I didn't succeed, I have a question about this: "In the paper, we filtered out the rows where the source sentence and the target sentence are identical to encourge editing, you can do this by adding a line at line 41 in data_processing.py:
comp_txt,simp_txt=unzip([(i[0],i[1]) for i in zip(comp_txt,simp_txt)] if i[0] != i[1]])."
By adding this line I get the following error
How did you solve it?
I also can't find the unzip function used in that line of code
@oskrmiguel i forked the repo and corrected a couple of syntax errors if you want to have a look, maybe it could be useful
@cece95 @yuedongP Hi I'm also trying to use the model on new data. Did you get this solved? Thanks
Regarding the previous post, this worked for me:
zip(*[(i[0], i[1]) for i in zip(comp_txt, simp_txt) if i[0] != i[1]])
And for preprocessing data:
data_comp = open(comp, "r", encoding="utf-8").read().splitlines()
data_simp = open(simp, "r", encoding="utf-8").read().splitlines()
df = process_raw_data(data_comp, data_simp)
editnet_data_to_editnetID(df, "{}.df.filtered.pos".format(label))
Label should be "train" or "val" according to your datasets.
Suggestions are welcome if any improvement is needed for this solution :)