LMFlow
LMFlow copied to clipboard
Add comprehensive tokenization tests, update diagram, and adjust code to handle edge cases
call stack diagram for dataset
call stack diagram for dataset
and tokenization
Code changes based on tokenization tests
- I updated ConversationTemplate.encode_conversation to drop any unpaired final message when there’s an odd count and return only the paired turns; if the first message isn’t from the user, I skip encoding and return an empty list.
- I also tweaked both hf_decoder_model.py and hf_text_regression_model.py so that if a ConversationTemplate lacks a system_formatter, I set system=None before calling encode_conversation, avoiding ValueError on unformatted system prompts.