course
course copied to clipboard
a question about the "6.tokenizers library"
when i studied the "6. The tokenizers liabrary -- Unigram tokenization" , i couldn't understand the following
why the P("pu") = 5/210, shouldn't it be the 17 / 210, because the P("g") = 20 / 210 according to the frequency of g is 20.