LSTM-Text-Generation Regarding repeated words

Hey, Nice work!!

I just have one query. I am working on character-level text generation with data as novels.

In Novels, there are character names, which occur too many times, so will that affect my model while generating text?

Thanks in Advance!

Mar 16 '21 06:03 jaytimbadia

Hey - thanks!

I don't know what you mean by "too many times" but if word is repeated very often, then it will be more likely to be generated as part of the output.

To mitigate it, you could replace each occurrence of the word with a randomly generated name.

I.e, let's say you have the text

"Jaytimbadia was a knight named Jaytimbadia. Jaytimbadia lived in the city of .... " with that particular name occurring over and over.

What you could do is replace each occurrence with a unique name:

"Foobity was a knight named Buzzityfoo. Fizzbuzz lived in the city of .... "

That way, you won't tint the model and you will make the model more likely to generate unique names for your characters. Bear in mind, however, that the model will learn the style of the names you insert, so you will want to make sure to generate names that are suitable to the style of the novel!

Mar 16 '21 06:03 tx46

Hey - thanks!

I don't know what you mean by "too many times" but if word is repeated very often, then it will be more likely to be generated as part of the output.

To mitigate it, you could replace each occurrence of the word with a randomly generated name.

I.e, let's say you have the text

"Jaytimbadia was a knight named Jaytimbadia. Jaytimbadia lived in the city of .... " with that particular name occurring over and over.

What you could do is replace each occurrence with a unique name:

"Foobity was a knight named Buzzityfoo. Fizzbuzz lived in the city of .... "

That way, you won't tint the model and you will make the model more likely to generate unique names for your characters. Bear in mind, however, that the model will learn the style of the names you insert, so you will want to make sure to generate names that are suitable to the style of the novel!

Thank you so much for the reply.

This is what I was looking for, I was training on harry potter novels and it has too many occurrences of harry, Ron, and Hermoine. So I would change with random names as you suggested.

Mar 16 '21 09:03 jaytimbadia

Hey - thanks! I don't know what you mean by "too many times" but if word is repeated very often, then it will be more likely to be generated as part of the output. To mitigate it, you could replace each occurrence of the word with a randomly generated name. I.e, let's say you have the text "Jaytimbadia was a knight named Jaytimbadia. Jaytimbadia lived in the city of .... " with that particular name occurring over and over. What you could do is replace each occurrence with a unique name: "Foobity was a knight named Buzzityfoo. Fizzbuzz lived in the city of .... " That way, you won't tint the model and you will make the model more likely to generate unique names for your characters. Bear in mind, however, that the model will learn the style of the names you insert, so you will want to make sure to generate names that are suitable to the style of the novel!

Thank you so much for the reply.

This is what I was looking for, I was training on harry potter novels and it has too many occurrences of harry, Ron, and Hermoine. So I would change with random names as you suggested.

Also one more thing, text generated has too many of grammatical errors, any method you know which can reduce that apart from training for more epochs and data?

Mar 16 '21 09:03 jaytimbadia

No, if there are grammatical errors then the trained model learn to "speak" with bad grammar!

Mar 16 '21 11:03 tx46