transfer-learning-conv-ai icon indicating copy to clipboard operation
transfer-learning-conv-ai copied to clipboard

How are the distractors made in the dataset?

Open Cakeszy opened this issue 5 years ago • 3 comments

I want to use my own custom dataset with this project, but I don't understand how the distractors were made in the original dataset to get a grasp on how to do this. Are they randomly sampled from other conversations?

Cakeszy avatar May 25 '20 16:05 Cakeszy

I suggest looking at example_entry.py. The candidates are all possible replies to the prompt sentence. In train.py:

for j, candidate in enumerate(utterance["candidates"][-num_candidates:]): lm_labels = bool(j == num_candidates-1) instance = build_input_from_segments(persona, history, candidate, tokenizer, lm_labels)

The last sentence in candidate is taken as the gold reply. Everything else in candidates is taken as the distractor.

DamienLopez1 avatar May 27 '20 15:05 DamienLopez1

I suggest looking at example_entry.py. The candidates are all possible replies to the prompt sentence. In train.py:

Please could you share which command you're running to train on the example_entry.py?

I'm trying (without modifying example_entry.py) python ./train.py --dataset_path=example_entry.py but i get errors like

ERROR:ignite.engine.engine.Engine:Engine run is terminating due to exception: Target -100 is out of bounds..

made-by-chris avatar Jun 10 '20 20:06 made-by-chris

Sorry for the late response.

I did not really use the example_entry.py to run an example. As far as I am aware example_entry.py is just an example of the format in the JSON files.

If you want to see how all the distractors being selected I suggest you add a print statement in this code snippet from train.py:

for j, candidate in enumerate(utterance["candidates"][-num_candidates:]): lm_labels = bool(j == num_candidates-1) print(candidate) instance = build_input_from_segments(persona, history, candidate, tokenizer, lm_labels)

In my code its at line 93.

DamienLopez1 avatar Jun 19 '20 10:06 DamienLopez1