helix icon indicating copy to clipboard operation
helix copied to clipboard

preserve indentation for code output

Open lukemarsden opened this issue 2 years ago • 2 comments

will matter more when #54 is implemented but

we need to fix the text chunking code in the runner so that it preserves whitespace properly which matters for e.g. code generation (will be easier to see in the browser when we add markdown support and if the model spits out ``` s)

cases we care about:

  • split on words so user gets quick updates in each chunk
  • preserve newlines
  • preserve whitespace (e.g. paragraphs of prose is \n\n vs. newlines in poetry is \n, code where indentation matters: four spaces, tabs)

we can probably make the scanner just split on the first whitespace but include it and not skip the next 3 (in the case that the model outputs four space characters in a row) - and then stop adding a space back in (because we shouldn't be assuming the whitespace is a space character

related changes:

  • https://github.com/helixml/helix/commit/82352b43136564388edcca3390a3e8975dee8f53
  • https://github.com/helixml/helix/commit/f8d630a4bd209bf18d1e89927c8cfbead04b60d8

slack thread: https://mlops-community.slack.com/archives/C0675EX9V2Q/p1705056847386509

lukemarsden avatar Jan 12 '24 11:01 lukemarsden

as said above, the chunks should just include all the whitespace as well as being split on the whitespace, and we don't re-add a hard-coded space between chunks in the api server

lukemarsden avatar Jan 12 '24 11:01 lukemarsden

while we're in there, we should clean up the newline at the start and end of every response currently

lukemarsden avatar Feb 04 '24 08:02 lukemarsden