EditNTS icon indicating copy to clipboard operation
EditNTS copied to clipboard

How to run the model on new data

Open cece95 opened this issue 5 years ago • 11 comments

Hi! I succesfully trained the model and now i would like to use it on new data, i assumed that the new data would need to be processed as well but the preprocessing expects the simplified sentences as an input, which i don't have.

How should i proceed? Is there any quick way to run the model on new data or do i need to write the preprocessing and the prediction myself?

cece95 avatar Feb 25 '20 10:02 cece95

Hi,

I think as long as you have the source and the target in txt files, you can use the data_preprocessing.py to make it into data frame and the train and run the model on new data. Just need to make sure the data frame has the specific columns ['comp_tokens', 'simp_tokens','comp_ids','simp_ids', 'comp_pos_tags', 'comp_pos_ids', edit_labels','new_edit_ids']

YueDongCS avatar Feb 26 '20 00:02 YueDongCS

Sorry i think i didn't expain myself properly, i meant data of which i don't have the target simple sentences.

I have my own set of complex sentences that i wanted to try and simplify using your model

cece95 avatar Mar 01 '20 12:03 cece95

I'd also be interested in how to do this!

On Sun, 1 Mar 2020, 12:55 cesare grigoletto, [email protected] wrote:

Sorry i think i didn't expain myself properly, i meant data of which i don't have the target simple sentences.

I have my own set of complex sentences that i wanted to try and simplify using your model

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yuedongP/EditNTS/issues/4?email_source=notifications&email_token=ABQZAPIGBGXFWC4IFJYI2EDRFJLMFA5CNFSM4K3EPK62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENM6K5A#issuecomment-593094004, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQZAPJUJOYELFQK6MN2UZLRFJLMFANCNFSM4K3EPK6Q .

siangooding avatar Mar 01 '20 13:03 siangooding

Hello, thought I would follow up on this. @cece95 did you manage to get it working?

siangooding avatar Mar 08 '20 14:03 siangooding

Hi,

The model needs supervised training. I will retrain the model and release the pre-trained model later, which can be used for other datasets without the target.

YueDongCS avatar Mar 08 '20 20:03 YueDongCS

Hello. I'm interested in that model too since I'm working on simplifying text for my PhD

oskrmiguel avatar Mar 09 '20 00:03 oskrmiguel

Hi,

The model needs supervised training. I will retrain the model and release the pre-trained model later, which can be used for other datasets without the target.

Thank you very much

cece95 avatar Mar 09 '20 12:03 cece95

Hi! I succesfully trained the model and now i would like to use it on new data, i assumed that the new data would need to be processed as well but the preprocessing expects the simplified sentences as an input, which i don't have.

How should i proceed? Is there any quick way to run the model on new data or do i need to write the preprocessing and the prediction myself?

Hello, I tried to report the work but I didn't succeed, I have a question about this: "In the paper, we filtered out the rows where the source sentence and the target sentence are identical to encourge editing, you can do this by adding a line at line 41 in data_processing.py:

comp_txt,simp_txt=unzip([(i[0],i[1]) for i in zip(comp_txt,simp_txt)] if i[0] != i[1]])."

By adding this line I get the following error imagen

How did you solve it?

I also can't find the unzip function used in that line of code

oskrmiguel avatar Apr 16 '20 20:04 oskrmiguel

@oskrmiguel i forked the repo and corrected a couple of syntax errors if you want to have a look, maybe it could be useful

cece95 avatar Apr 17 '20 11:04 cece95

@cece95 @yuedongP Hi I'm also trying to use the model on new data. Did you get this solved? Thanks

jiajinghu19 avatar Jan 07 '21 21:01 jiajinghu19

Regarding the previous post, this worked for me: zip(*[(i[0], i[1]) for i in zip(comp_txt, simp_txt) if i[0] != i[1]])

And for preprocessing data: data_comp = open(comp, "r", encoding="utf-8").read().splitlines() data_simp = open(simp, "r", encoding="utf-8").read().splitlines() df = process_raw_data(data_comp, data_simp) editnet_data_to_editnetID(df, "{}.df.filtered.pos".format(label))

Label should be "train" or "val" according to your datasets.

Suggestions are welcome if any improvement is needed for this solution :)

lmvasque avatar Jan 14 '21 18:01 lmvasque