llama2.c
llama2.c copied to clipboard
Initialize Tokenizer and simplify str_lookup prototype
trafficstars
- new method to initialize tokenizer with a given vocab_size
- removed voacb_size from the arguments of build_tokenizer
- applied the changes in run.c, runq.c, test.c
- pass the tokenizer object to str_lookup
- helps to easily follow - "oh! we will be checking for the string in the Tokenizer"
- makes it easy for future changes to sentencepiece code <I couldn't follow the TODO, so start with something simpe=le>
- passes: unit test (test.c)
- passes: integration test (./run stories15M.bin on Mackbook M1>