markovify
markovify copied to clipboard
Control generated sentences randomness
I have a relatively small corpus to work with and I often get the generated sentences very similar to the corpus ones. Would it be possible to add a way to control the randomness of the generated sentences? Maybe from 0 to 1, how random do you want the output to be?
Hi @erikrakuscek, and thanks for this note. You raise a really interesting philosophical question. As you're thinking about it, what does "random" signify to you? (I ask because Markov chains depend entirely on randomness — but a particular kind of it.)
Aside from the general definition, I can think initially of two potential approaches that might fit your goal of generating sentences that differ more from those in the corpus:
-
Write a function to keep new generating sentences until one meets a certain threshold of different-ness (as judged by a metric you determine). This is essentially what
max_overlap_ratiodoes (described in final bullet point here), but you may be looking for a different metric. -
Dig into the specific weights assigned to each state in
model.model, and change them in some way (perhaps perturbing them randomly up/down).
Hi @erikrakuscek, just checking back on this.