axolotl
axolotl copied to clipboard
Fix(preprocess): Use space delimiter for debug_text_only also
@TheBloke , I couldn't find the corresponding PR that you mentioned in the earlier issue https://github.com/OpenAccess-AI-Collective/axolotl/pull/462 .
This PR will fix debug_text_only having outputs without space.
Before:
<s><|im_start|>system<0x0A>Youareahelpfulassistant.<|im_end|><0x0A><|im_start|>user<0x0A>LetXbeacompactconnected
This PR is currently wrong. I didn't get a chance to come back to fix it yet. It adds extra spaces sometimes. Need to check again later.
Thanks for the PR Nano, and sorry I never got around to making it myself.
I've not tried your code so don't know in what way it's wrong. But when I have fixed this locally, I did it with a single line:
delimiter = " "
And that seemed to work fine for me?
I noticed that , when printing the tokens, there were "empty" tokens, which led to the extra spacing. Maybe all I need is a check if string has len>0, to include them.