llama2.c icon indicating copy to clipboard operation
llama2.c copied to clipboard

Initialize Tokenizer and simplify str_lookup prototype

Open pagakarthik opened this issue 1 year ago • 0 comments
trafficstars

  • new method to initialize tokenizer with a given vocab_size
  • removed voacb_size from the arguments of build_tokenizer
  • applied the changes in run.c, runq.c, test.c
  • pass the tokenizer object to str_lookup
  • helps to easily follow - "oh! we will be checking for the string in the Tokenizer"
  • makes it easy for future changes to sentencepiece code <I couldn't follow the TODO, so start with something simpe=le>
  • passes: unit test (test.c)
  • passes: integration test (./run stories15M.bin on Mackbook M1>

pagakarthik avatar Feb 25 '24 23:02 pagakarthik