axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path

Open NanoCode012 opened this issue 1 year ago • 1 comments

Fix hash across tokenizers. We should not base on tokenizer class name as they could have different vocabs.

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

NanoCode012 avatar Feb 17 '24 03:02 NanoCode012

Test failing due to not normalizing config, I believe.

NanoCode012 avatar Feb 17 '24 04:02 NanoCode012