hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

[good first issue - advanced][Example] Create a dataflow modeling information extraction using an LLM

Open skrawcz opened this issue 1 year ago • 8 comments

Write an example dataflow that uses Hamilton to model an information extraction task using an LLM.

For example:

  1. given an output schema
  2. given input text
  3. make a prompt that is sent to an LLM API (pick one)
  4. then write a function to validate the output

For inspiration you can look at Langchain's implementation.

The code for this example should end up under the /examples/LLM_Workflows directory.

skrawcz avatar Jul 30 '23 05:07 skrawcz

Hi @skrawcz , I want to work on this issue. Can you guide me a little on what exactly needs to be done here ?

raghav24agarwal avatar Jul 30 '23 11:07 raghav24agarwal

Hi @skrawcz , I want to work on this issue. Can you guide me a little on what exactly needs to be done here ?

Sure, if the following makes sense to you. If not we can see if there's another issue that would be a fit.

  1. Are you familiar with Hamilton? If not, I suggest going through the tutorial at www.tryhamilton.dev to understand what Hamilton does. That will help you understand the code to write.
  2. Are you familiar with LLM APIs? Do you know what a prompt is?
  3. A use case for this work might look like the following:

I have user feedback data about some product, and I want to extract some information from it. E.g. what the product the review is about, what the sentiment is, etc.

Does that make sense?

In terms of an analogous example - you can see LangChain's example which shows a schema, i.e. what we want to get back out, and an example sentence, and then the result.

i.e. given

Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.

it outputs:

[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},
     {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]
  1. LLMs aren't always good at following directions, so the example should show a check to parse & validate the output returned.
  2. So the task would be to encode the above into a Hamilton dataflow, or DAG. With the first deliverable being code that ends up in the examples/ directory. But longer term we'd ship this as part of the user contributed code library that will ship with Hamilton.

skrawcz avatar Jul 30 '23 22:07 skrawcz

Hi @skrawcz , thanks for explaining the use case. Let me try it on a basic level and get back to you. I am taking inference from Knowledge Retrieval example under LLM workflows and I guess the core concept remains same.

raghav24agarwal avatar Jul 31 '23 18:07 raghav24agarwal

Hi @skrawcz , thanks for suggesting LangChain, we can leverage the same API for this use case. On top of that, we can have a wrapper for encoding it into a Hamilton dataflow. But for implementing LangChain, we need an open_api_key, which I am unable to find. I am really curious where are you setting it to use OpenAi api in other examples. Can you please help me with that ?

raghav24agarwal avatar Aug 12 '23 18:08 raghav24agarwal

thanks for suggesting LangChain, we can leverage the same API for this use case. On top of that, we can have a wrapper for encoding it into a Hamilton dataflow.

Just to be clear. We don't want to wrap langchain. We want to take a "chain" and reimplement it in Hamilton.

we need an open_api_key

Yep. You need to sign up for one. It comes with a few dollars free credit to create the API. E.g. sign up via openai.com. It also doesn't have to be openai if there's an alternative LLM API.

skrawcz avatar Aug 13 '23 21:08 skrawcz

@skrawcz, can I work on it? Or is it a WIP?

flaviassantos avatar Oct 02 '23 07:10 flaviassantos

@skrawcz, can I work on it? Or is it a WIP?

Yes it is open and you could take it. But, this could be a little bit of work to figure out how to write code. The output should look similar in style to this text_summarization example. It will require you to understand the langchain code and then translate it into how it might look in Hamilton. For a quicker task I would look at https://github.com/DAGWorks-Inc/hamilton/issues/284 or #410 .

skrawcz avatar Oct 02 '23 14:10 skrawcz

@skrawcz, I am familiar with both OpenAI´s API and Langchain. Have tried them myself. So I would love to give it a try :)

flaviassantos avatar Oct 03 '23 15:10 flaviassantos