guidance icon indicating copy to clipboard operation
guidance copied to clipboard

Terminal colors

Open hudson-ai opened this issue 1 year ago • 9 comments
trafficstars

This PR:

  1. Automatically detects whether the user is running in the command-line or in a notebook
    • When in a notebook, echo=True gives a pretty HTML display
    • When in the command-line, raw text is printed to stdout (no more <IPython.core.display.HTML object>!)
  2. Uses ANSI color codes to color generated text
    • Green by default
    • If colored package is installed (added as optional dependency), colors will depend on log-probabilities of generated tokens

Thank you @dimondjik for your implementation!

In action: terminal_color

hudson-ai avatar Jun 18 '24 23:06 hudson-ai

@Harsha-Nori @dimondjik :)

hudson-ai avatar Jun 18 '24 23:06 hudson-ai

So happy to see this! I think it looks wonderful.

Would appreciate @nopdive 's review given Sam's experience in notebook env detection

Also, just a thought -- might be fun to have the commit co-authored by @dimondjik if comfortable!

Harsha-Nori avatar Jun 19 '24 12:06 Harsha-Nori

Could this be refactored? It looks like it has a potentially fragile regex for parsing out styling information, and is sitting as another stage in the pipeline.

Should Engine acquire another property such as output_processor which would be some abstract base class (or rather, some reasonable Python facsimile thereof). Wherever our current code writes some output, it calls the appropriate method on the class. We then have implementations of the output_processor for Python Notebooks, console output etc.

This would also let us put all the styling information in the output_processor itself, rather than have it scattered across the main codebase.

riedgar-ms avatar Jun 19 '24 12:06 riedgar-ms

Could this be refactored? It looks like it has a potentially fragile regex for parsing out styling information, and is sitting as another stage in the pipeline.

Should Engine acquire another property such as output_processor which would be some abstract base class (or rather, some reasonable Python facsimile thereof). Wherever our current code writes some output, it calls the appropriate method on the class. We then have implementations of the output_processor for Python Notebooks, console output etc.

This would also let us put all the styling information in the output_processor itself, rather than have it scattered across the main codebase.

Reasonable concerns and a nice suggestion. It would be nice to encapsulate the existing pretty-output code regardless. I'll see what I can do 😎

hudson-ai avatar Jun 19 '24 16:06 hudson-ai

So happy to see this! I think it looks wonderful.

Would appreciate @nopdive 's review given Sam's experience in notebook env detection

Also, just a thought -- might be fun to have the commit co-authored by @dimondjik if comfortable!

@Harsha-Nori do you know if it's possible to add a co-author during the squash-and-merge process? Otherwise, I really think that the first commit on this PR was 99.99% @dimondjik ... maybe I can amend the history of this branch to properly credit them as an author/contributor :)

hudson-ai avatar Jun 19 '24 16:06 hudson-ai

Nice! First of all, I wasn't expecting it to get anywhere at all, I'm glad I could help! @hudson-ai I thank you for the PR from all the people working in terminal and not seeing colors like Notebook people XD

I've read the thread and I absolutely 100% sure that @riedgar-ms 's suggestion is the way.

But, I live by the motto "don't touch something that works", so from my perspective trying to rewrite something in Engine, when we can just add extra layer (yes, it is making the output somewhat slow, sadly), so two and a half programmers can see colors in terminal when debugging. I believe in production - colors in terminal is not that important. And also I am too lazy to dig inside the code.

So, I'm not here to just talk, when I was playing with the llm I found out that this approach "eats" any html-like tag, even generated by the llm. I.e. Mistral's "<s></s>".

where-the-tags-drake

So here's my local version I was updating for myself:

class ModelStateHTMLParser:
    def __init__(self):
        self.__prev_data_len = 0

        self.__ptr = (
            re.compile(r"<\|\|_html:([\S\s]+?)_\|\|>|([\S\s]+?)(?=<\|\|_html:|\Z)"))

        self.__color_param_ptr = (
            re.compile(r"rgba\(([0-9]+?\.??[0-9]*?),\s*?([0-9]+?\.??[0-9]*?),\s*?([0-9]+?\.??[0-9]*?)"))

    def feed(self, data):
        data_trunc = data[self.__prev_data_len:]
        self.__prev_data_len = len(data)

        for res in self.__ptr.finditer(data_trunc):
            if res.group(1) is not None:
                try:
                    r, g, b = [round(float(n)) for n in self.__color_param_ptr.search(res.group(1)).groups()]
                    print('\033[38;2;{};{};{}m'.format(r, g, b), end='', flush=True)
                except AttributeError:
                    print('\033[0m', end='', flush=True)
            if res.group(2) is not None:
                print(res.group(2), end='', flush=True)
  • It uses colors from the html markdown dependency-less, just ANSI escape
  • Almost bulletproof regex for tag detection
  • Last length of the prompt is stored inside the class now, otherwise what's the point not making "feed" static

@hudson-ai I'm ready to push to terminal_colors, just give me the green light XD

Anyways, you guys are awesome, best wishess

dimondjik avatar Jun 19 '24 18:06 dimondjik

Marked this as a draft, as there is clearly a good amount of work left to be done.

@dimondjik thank you for the additions! Definitely good to be careful about eating html tags produced by the model, and I like that this allows us to drop the dependency I added.

Also as a terminal user rather than a notebook user, I'm super excited about this 😆

Maybe the best way for us to collaborate is for you to open a PR to my branch? No need to be super formal or anything :) I'd just ask that you return from feed rather than printing so that we can handle I/O further up the stack.

hudson-ai avatar Jun 19 '24 20:06 hudson-ai

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 93.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 58.30%. Comparing base (6e4ee06) to head (0b13ca8). Report is 285 commits behind head on main.

Files with missing lines Patch % Lines
guidance/_utils.py 94.44% 1 Missing :warning:
guidance/models/_model.py 91.66% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #905      +/-   ##
==========================================
+ Coverage   55.54%   58.30%   +2.76%     
==========================================
  Files          63       63              
  Lines        4733     4754      +21     
==========================================
+ Hits         2629     2772     +143     
+ Misses       2104     1982     -122     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Jun 21 '24 21:06 codecov-commenter

@nopdive re: colored/colorama, @dimondjik shared some code that should let us avoid adding a dependency at all (although I want to triple check for platform differences when it comes to ansi code support)

hudson-ai avatar Jun 25 '24 18:06 hudson-ai