blackout icon indicating copy to clipboard operation
blackout copied to clipboard

Use heuristics when choosing which of several matches to use

Open mkremins opened this issue 7 years ago • 0 comments

We currently poemify a block of text by running several matchers over it in parallel, then randomly choosing one successful match to use from all the matches that succeeded. I suspect it’s possible to produce better results by using something other than a purely random draw when selecting which of several matches to use.

Some hypothesized heuristics that might produce better results:

  • Prefer longer matches (i.e. matches containing more words) over shorter matches.
  • Prefer matches containing fewer pronouns.
  • Prefer matches containing fewer repeated words.
  • Parallelism: prefer matches that are similar to previously selected matches in grammatical structure and/or word choice. (This one in particular might do a lot to make entire pages of generated poetry feel more coherent, because it would encourage repetition of a few key words and/or sentence structures throughout the page.)

Maybe we could assign each match a score based on these (or similar) heuristics; sort the list of successful matches by score; and then perform a semi-random selection that’s biased towards the front of the sorted list using something like biased-rand-nth.

mkremins avatar Mar 08 '17 17:03 mkremins