axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Fix(preprocess): Use space delimiter for debug_text_only also

Open NanoCode012 opened this issue 1 year ago • 3 comments

@TheBloke , I couldn't find the corresponding PR that you mentioned in the earlier issue https://github.com/OpenAccess-AI-Collective/axolotl/pull/462 .

This PR will fix debug_text_only having outputs without space.

Before:

<s><|im_start|>system<0x0A>Youareahelpfulassistant.<|im_end|><0x0A><|im_start|>user<0x0A>LetXbeacompactconnected

NanoCode012 avatar Jan 07 '24 03:01 NanoCode012

This PR is currently wrong. I didn't get a chance to come back to fix it yet. It adds extra spaces sometimes. Need to check again later.

NanoCode012 avatar Jan 11 '24 15:01 NanoCode012

Thanks for the PR Nano, and sorry I never got around to making it myself.

I've not tried your code so don't know in what way it's wrong. But when I have fixed this locally, I did it with a single line:

delimiter = " "

And that seemed to work fine for me?

TheBloke avatar Jan 11 '24 15:01 TheBloke

I noticed that , when printing the tokens, there were "empty" tokens, which led to the extra spacing. Maybe all I need is a check if string has len>0, to include them.

NanoCode012 avatar Jan 12 '24 03:01 NanoCode012