axolotl
axolotl copied to clipboard
fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path
Fix hash across tokenizers. We should not base on tokenizer class name as they could have different vocabs.
Description
Motivation and Context
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Test failing due to not normalizing config, I believe.