llama_index [Question]: How to write the stream_complete function for my custom LLM (used for Gradio)?

[Question]: How to write the stream_complete function for my custom LLM (used for Gradio)?

Open waltonfuture opened this issue 11 months ago • 1 comments

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

class HuatuoLLM(CustomLLM):
    dummy_response: str = "My response"
    @property
    def metadata(self) -> LLMMetadata:
        """Get LLM metadata."""
        return LLMMetadata(
            context_window=2048,
            #num_output=256,
            model_name=model_name,
        )

    @llm_completion_callback()
    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        # only return newly generated tokens
        messages = []
        messages.append({"role": "user", "content": prompt})
        text = model.HuatuoChat(tokenizer, messages)
        #text,_ = model.chat(tokenizer, prompt, history=[])
        return CompletionResponse(text=text)

    @llm_completion_callback()
    # def stream_complete(
    #     self, prompt: str, **kwargs: Any
    # ) -> CompletionResponseGen:
    #     messages = []
    #     messages.append({"role": "user", "content": prompt})
    #     response = model.HuatuoChat(tokenizer, messages, stream=True)
    #     return response
    def stream_complete(
        self, prompt: str, **kwargs: Any
    ) -> CompletionResponseGen:
        response = ""
        for token in self.dummy_response:
            response += token
            yield CompletionResponse(text=response, delta=token)

What should I do to the stream_complete function? And how to get the stream output? Thanks for help!

Mar 24 '24 13:03 waltonfuture

The stream_complete function in your custom LLM class is a generator function that yields a CompletionResponse for each token in the dummy_response string. This function is decorated with llm_completion_callback(), which is designed to manage event dispatching and callback management for LLM completion tasks.

In the context of stream_complete, the decorator does the following:

Dispatches a LLMCompletionStartEvent before the function logic is executed.
Registers the start of an event with the callback manager.
Executes the stream_complete function, which yields CompletionResponse objects.
After each yield, it dispatches a LLMCompletionEndEvent and registers the event's end with the callback manager.

The stream_complete function itself is designed to yield a CompletionResponse for each token in the dummy_response string. Each CompletionResponse contains the current response (the accumulated tokens up to the current token) and the delta (the current token). This allows the function to provide a stream of responses, with each response reflecting the addition of a new token.

Here's a simplified explanation of how the stream_complete function works:

It initializes an empty string response.
It iterates over each token in dummy_response.
For each token, it adds the token to response and yields a CompletionResponse with the current response and the delta (the current token).

This design allows the stream_complete function to provide a stream of responses, with each response reflecting the addition of a new token.

However, please note that the actual implementation of the stream_complete function would depend on the specifics of your LLM and how you want to handle streaming completions. The provided code is a simplified example and may not cover all the complexities of a real-world LLM.

For more information on the llm_completion_callback() decorator and how it works, you can refer to the llama-index-core/llama_index/core/llms/callbacks.py file in the LlamaIndex repository.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Mar 24 '24 13:03 dosubot[bot]

llama_index llama_index copied to clipboard

[Question]: How to write the stream_complete function for my custom LLM (used for Gradio)?

Question Validation

Question

Sources

About Dosu

llama_index
llama_index copied to clipboard