streamly-examples
streamly-examples copied to clipboard
An LLM tokenizer implemented as a streamly application
A greedy tokenizer breaks text into words based on data driven rules it has learnt. The learning phase finds the most common pair of tokens in the data and merges them into a new token.
This is a pure text processing application which can be re-imagined as a streaming application, a study of all three fundamental constructs of streaming - Streams, Folds and Pipes and a demonstration of the streamly framework.
A review is welcome.