gpt_bpe issues

Llama 3 Support

This PR aims to update gpt_bpe to support uint32 and Llama 3. The change from uint16 to uint32 is in order to allow gpt_bpe to support vocab sizes greater than...

Rexwang8

Update CLI docs to list new tokenizers

This sort of thing, but sprinkled over a variety of parts of the codebase. https://github.com/wbrown/gpt_bpe/blob/534087680bf6b9fa5b7cab3e72e41d0b992fb583/cmd/tokens_transformer/tokens_transformer.go#L13-L16 Tokenizers to add: https://github.com/wbrown/gpt_bpe/blob/534087680bf6b9fa5b7cab3e72e41d0b992fb583/gpt_bpe.go#L112-L114 Which I believe internally use the identifiers `llama`, `llama3`, and `mistral`...

dmarx

gpt_bpe
gpt_bpe copied to clipboard

Metadata

Llama 3 Support

Update CLI docs to list new tokenizers

Update transpiler and allow creation of gobs

GLM-4.5 support

Smanor/sentinel support

← Metadata

Owner

Metadata

gpt_bpe gpt_bpe copied to clipboard

Metadata

Llama 3 Support

Update CLI docs to list new tokenizers

Update transpiler and allow creation of gobs

GLM-4.5 support

Smanor/sentinel support

← Metadata

Owner

Metadata

gpt_bpe
gpt_bpe copied to clipboard