markovify where is the math?

Hi there and thanks for this beautiful package!! i was curious to understand more the inner workings of the package. Is there a pdf that explains how you use Markov chains exactly to create a model (with some equations)?

thanks!

Jul 26 '20 20:07 randomgambit

Thanks for the kind words, @randomgambit! I'm not a trained mathematician, so I am not the most authoritative person to answer this, but I believe that this chapter of Grinstead and Snell provides a helpful overview: https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf

Does that answer your question?

Aug 02 '20 18:08 jsvine

thank you! yes i know how markov chains work but I am not sure how do you implement them in the package. what is the transition matrix, how many states, etc. Can you please share more details?

thanks!

Aug 02 '20 18:08 randomgambit

Hi @randomgambit. The state size defaults to 2 but is configurable. The transition matrix is calculated in markovify.chain.Chain.

Aug 02 '20 18:08 jsvine

got it, but it is a bit hard to understand what is going on by just looking at the code. I think it would immensely help if you could write a small example of how the algo works (by using, say, a corpus of just two simple sentences).

Aug 02 '20 18:08 randomgambit

Would something like this satisfy your curiosity?:

For text corpora, Markovify begins the process of building its Markov models by splitting each corpus into a series of sentences, and each sentence into a series of "tokens" (i.e., words, plus placeholder tokens for the beginning and end of a sentence).

Then, walking sentence-by-sentence, it identifies every sequence of state_size (default: 2) tokens, and calculates how many times any other token comes immediately afterward. This dictionary of corpus[(token_a, token_b, ...)][next_token]: count frequencies comprise the "transition matrix" of the resulting Markov model.

Thus, "Janice walked to the park. After that, Janice walked to the zoo." becomes:

{
 ('___BEGIN__', '___BEGIN__'): {'After': 1, 'Janice': 1},
 ('___BEGIN__', 'After'): {'that,': 1},
 ('___BEGIN__', 'Janice'): {'walked': 1},
 ('After', 'that,'): {'Janice': 1},
 ('Janice', 'walked'): {'to': 2},
 ('that,', 'Janice'): {'walked': 1},
 ('the', 'park.'): {'___END__': 1},
 ('the', 'zoo.'): {'___END__': 1},
 ('to', 'the'): {'park.': 1, 'zoo.': 1},
 ('walked', 'to'): {'the': 2}}
}

When Markovify attempts to generate a new sentence, it randomly chooses each successive token using these frequency-based probabilities.

Aug 05 '20 03:08 jsvine

super interesting!!! thanks! I really think you should include this example on the main page.

I am just trying to figure out the exact size of the transition matrix in the example above... can we write it down or there are too many rows/columns?

Thanks!

Aug 05 '20 03:08 randomgambit

markovify markovify copied to clipboard

where is the math?

markovify
markovify copied to clipboard