question_generation icon indicating copy to clipboard operation
question_generation copied to clipboard

Generate exact number of questions

Open krrishdholakia opened this issue 5 years ago • 6 comments

Hi,

Great work on the library 🎉, it's super useful.

Is it possible to generate a specific number of questions ? I know we have 'num_return_sequences' but i've seen that despite specifying specifying a high number of return sequences:

`model_args = { "max_length": 256, "num_beams": 12, "length_penalty": 1.5, "no_repeat_ngram_size": 3, "num_return_sequences": 10, "early_stopping": True, }

nlp(text5, model_args)`

i still get fewer than expected questions:

['The speed of light is slower in a medium other than what?', 'What is responsible for phenomena such as refraction?', 'The idea of light scattering from, or being absorbed and re-emitted by atoms, is both incorrect and what is not seen in nature?']

krrishdholakia avatar Aug 30 '20 07:08 krrishdholakia

Can you provide the context? because it gives me exactly the same number of questions I ask for.

psinha30 avatar Aug 31 '20 08:08 psinha30

sure -

i'm using the t5 model from the nlp pipeline e2e

and i run the nlp argument given above: nlp(text5, model_args)

any additional context i can give ?

krrishdholakia avatar Sep 02 '20 11:09 krrishdholakia

thanks @krrishdholakia !

Here num_return_sequences can't be used because the number questions generated will depend on the number of answers extracted. If the ans extraction model gives only two answer then only two questions will be generated.

num_return_sequences is used with beam search or top-k top-p sampling in the .generate method. With beam search, in most of the cases it returns the similar or slightly paraphrased version of the same questions, so I'm not using num_return_sequences.

I'm trying out other methods for better answer extraction, but havn't got any good results yet. Will ping you if I find some other method to extract more answers.

patil-suraj avatar Sep 06 '20 06:09 patil-suraj

hey @patil-suraj,

they mention an interesting approach using top-k sampling in this article - https://medium.com/huggingface/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313

thoughts on using this ?

krrishdholakia avatar Sep 06 '20 07:09 krrishdholakia

I have tried sampling but beam search results are better than sampling for this task. Feel free to give it try though!

patil-suraj avatar Sep 06 '20 07:09 patil-suraj

thanks @krrishdholakia !

Here num_return_sequences can't be used because the number questions generated will depend on the number of answers extracted. If the ans extraction model gives only two answer then only two questions will be generated.

num_return_sequences is used with beam search or top-k top-p sampling in the .generate method. With beam search, in most of the cases it returns the similar or slightly paraphrased version of the same questions, so I'm not using num_return_sequences.

I'm trying out other methods for better answer extraction, but havn't got any good results yet. Will ping you if I find some other method to extract more answers.

Hi @patil-suraj, great work! May I ask how do you control the number of answers extracted? My project came up with a different number of answers for different texts.

nomoreoneday avatar Jan 27 '21 02:01 nomoreoneday