dspy included usage logging for openai calls

-logging of total_tokens usage from OpenAI requests -thread-safe with Python logging library

Nov 06 '23 21:11 arnavsinghvi11

Looks great. Can we just activate logging only when dsp.settings.log_openai_usage is True?

Nov 07 '23 19:11 okhat

Now it will crash I think? We need the new dsp.settings flag to be set to False by default

Nov 08 '23 19:11 okhat

otherwise it will complain that it's unset

Nov 08 '23 19:11 okhat

Thanks for the feature!

Adding log_openai_usge=True here works for me.

I would suggest adding some explanation here, e.g. logging.info(f'#Total token in a LLM call: {total_tokens}') rather than just logging the number - otherwise, it's hard to understand the number when you have other log info. Or, a more useful feature will be accumulating the token number in maybe the Settings class (I found it's a Singleton class); the user can inspect the number at any time they want and do the calculation themselves. If you do this, a threading.Lock() may also be needed.

Nov 08 '23 20:11 shaoyijia

This is what I change in my dsp/utils/settings.py:

class Settings(object):
    """DSP configuration settings."""

    _instance = None
    _token_usage_lock = threading.Lock()

    def __new__(cls):
        """
        Singleton Pattern. See https://python-patterns.guide/gang-of-four/singleton/
        """

        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance.main_tid = threading.get_ident()
            cls._instance.main_stack = []
            cls._instance.stack_by_thread = {}
            cls._instance.stack_by_thread[threading.get_ident()] = cls._instance.main_stack

            #  TODO: remove first-class support for re-ranker and potentially combine with RM to form a pipeline of sorts
            #  eg: RetrieveThenRerankPipeline(RetrievalModel, Reranker)
            #  downstream operations like dsp.retrieve would use configs from the defined pipeline.
            config = dotdict(
                lm=None,
                rm=None,
                branch_idx=0,
                reranker=None,
                compiled_lm=None,
                force_reuse_cached_compilation=False,
                compiling=False,
                skip_logprobs=False,
                trace=None,
                release=0,
                log_openai_usage=False,
                token_usage={},
            )
            cls._instance.__append(config)

        return cls._instance

    """Omit unchanged functions"""

    def increment_token_usage(self, model, prompt_tokens, completion_tokens):
        with self._token_usage_lock:
            if model in self.token_usage:
                self.token_usage[model]['prompt_tokens'] += prompt_tokens
                self.token_usage[model]['completion_tokens'] += completion_tokens
            else:
                self.token_usage[model] = {'prompt_tokens': prompt_tokens, 'completion_tokens': completion_tokens}

Nov 08 '23 21:11 shaoyijia

Now it will crash I think? We need the new dsp.settings flag to be set to False by default

ah yes I was going to keep this but looking at this example on how users can set flags without including this in the settings and also since the settings getattr method would check for variables in the config dictionary, I thought dsp.settings would handle this itself. But adding the flag is definitely the simpler solution :)

Nov 09 '23 01:11 arnavsinghvi11

dspy dspy copied to clipboard

included usage logging for openai calls

dspy
dspy copied to clipboard