streamly-examples icon indicating copy to clipboard operation
streamly-examples copied to clipboard

An LLM tokenizer implemented as a streamly application

Open twitu opened this issue 8 months ago • 3 comments

A greedy tokenizer breaks text into words based on data driven rules it has learnt. The learning phase finds the most common pair of tokens in the data and merges them into a new token.

This is a pure text processing application which can be re-imagined as a streaming application, a study of all three fundamental constructs of streaming - Streams, Folds and Pipes and a demonstration of the streamly framework.

A review is welcome.

twitu avatar Feb 17 '25 11:02 twitu