markovify icon indicating copy to clipboard operation
markovify copied to clipboard

Control generated sentences randomness

Open erikrakuscek opened this issue 3 years ago • 2 comments

I have a relatively small corpus to work with and I often get the generated sentences very similar to the corpus ones. Would it be possible to add a way to control the randomness of the generated sentences? Maybe from 0 to 1, how random do you want the output to be?

erikrakuscek avatar Apr 17 '22 16:04 erikrakuscek

Hi @erikrakuscek, and thanks for this note. You raise a really interesting philosophical question. As you're thinking about it, what does "random" signify to you? (I ask because Markov chains depend entirely on randomness — but a particular kind of it.)

Aside from the general definition, I can think initially of two potential approaches that might fit your goal of generating sentences that differ more from those in the corpus:

  1. Write a function to keep new generating sentences until one meets a certain threshold of different-ness (as judged by a metric you determine). This is essentially what max_overlap_ratio does (described in final bullet point here), but you may be looking for a different metric.

  2. Dig into the specific weights assigned to each state in model.model, and change them in some way (perhaps perturbing them randomly up/down).

jsvine avatar Apr 20 '22 22:04 jsvine

Hi @erikrakuscek, just checking back on this.

jsvine avatar Mar 22 '23 22:03 jsvine