PreSumm icon indicating copy to clipboard operation
PreSumm copied to clipboard

(DEV) [-mode test_text] [-task abs]: not abstracting entire document

Open STEMlib opened this issue 4 years ago • 9 comments

Command line input python3 train.py -task abs -mode test_text -text_src ../raw_data/input_text.txt -test_from ../models/model_step_148000.pt -log_file ../logs/xsum -result_path ../results/xsum -visible_gpus -1

Problem Results is consistently from extracting one sentence from beginning of document. I thought changing the '-max_len' or 'max_pos' argument would resolve this issue but I have tried

-max_len 3000 
-max_len 30000

With the same result. Regardless of the value the abstractor result is the same.

I have also tried to change 'max_pos' with the inputs below

-max_pos 3000
-max_pos 30000

but it doesn't matter the value. I just get a torch error, for example: torch.Size([512, 768]) from checkpoint, the shape in current model is torch.Size([3000, 768]).

Questions

  • How do I know if I am correctly inputting the text file for analysis?
  • How should I change my code to analyze a large text document?
  • Could the result be a consequence of an incoherently written document?

STEMlib avatar May 07 '20 23:05 STEMlib

I got the same problem, can't summarize a larger text file than the sample input given by the repository. At first I got IndexError: tensors used as indices must be long, byte or bool tensors , then I tried putting different values for -max_pos and got the same error as yours.

suchanun avatar May 09 '20 14:05 suchanun

hey, I'm facing the same issue. Did anyone figure out a solution for that?

gandharvsuri avatar May 19 '20 09:05 gandharvsuri

@gandharvsuri Nothing yet. Still waiting.

STEMlib avatar May 20 '20 14:05 STEMlib

Hi, i tried this command to make a summary for src_text.txt but i don't find any result.

python /users/omri/workspace/Trainbert/PreSumm/src/train.py -mode test_text -task ext -test_from /users/omri/workspace/Trainbert/PreSumm/models/ext_model/model_step_39000.pt -text_src /users/omri/workspace/Trainbert/PreSumm/raw_data/src_text.txt -min_length 200 -max_length 1000 -result_path /users/omri/workspace/Trainbert/PreSumm/results/ext_bert_cnndm -visible_gpus 1

Please can any one help me how to get the summary?

dhouhaomri avatar Jun 01 '20 11:06 dhouhaomri

@dhouhaomri Check the results folder for the ext_bert_cnndm.candidate file.

STEMlib avatar Jun 01 '20 14:06 STEMlib

Hi, thank you. But I got two files. (. Condidate and. Gold). Gold file is empty, do you know why?


De : Brandon Touchet [email protected] Envoyé : lundi 1 juin 2020 15:18 À : nlpyang/PreSumm [email protected] Cc : OMRI Dhouha [email protected]; Mention [email protected] Objet : Re: [nlpyang/PreSumm] (DEV) [-mode test_text] [-task abs]: not abstracting entire document (#161)

@dhouhaomrihttps://github.com/dhouhaomri Check the results folder for the ext_bert_cnndm.candidate file.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/nlpyang/PreSumm/issues/161#issuecomment-636887109, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMYWYWUFZCV5B3MXEIQOJ6LRUO2DFANCNFSM4M3X6O4A.

dhouhaomri avatar Jun 03 '20 11:06 dhouhaomri

@STEMlib Hi, I am facing the same issue. Result is one or few sentences extracted from the beginning of document. Did you find the solution?

cdd-grc20 avatar Oct 12 '20 14:10 cdd-grc20

Still facing this issue. Any updates would be appreciated. Thank you!

lavanaythakral avatar Oct 21 '20 12:10 lavanaythakral

@cdd-grc20 @lavanaythakral

No solution yet. Perhaps we will need to look deeper into the code to understand the problem. Because, I suspect we may not get help anytime soon.

STEMlib avatar Oct 21 '20 16:10 STEMlib