dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Dspy for multineedle in a haystack

Open joshpopelka20 opened this issue 1 year ago • 2 comments

I'm trying to use llama 3.1 70B to do "multi-needle in a haystack" search. Basically, I'm asking the model to use a text and search through a list of terms; finding all terms in the given text. My current prompt isn't working and I didn't see any examples for "multi-needle in a haystack". Has something like this or something similar been tried with dspy?

I've done instruction fine-tuning with the 8B model, but it didn't work. Next step is the 70B model, but I'm starting to think I need some prompt engineering to figure it out. This project seems like a good start.

Any help would be much appreciated!

joshpopelka20 avatar Aug 21 '24 15:08 joshpopelka20

Fyi, this is my current prompt:

You will be searching for cancer terms in a given text and providing context for any matches found. Follow these instructions carefully:

1. Here is the list of cancer terms to search for:
<cancer_terms>
{context_str}
</cancer_terms>
2. Here is the text to search through:
<text_to_search>
{query_str}
</text_to_search>
3. Search through the text, looking for exact matches of the cancer terms provided. Follow these rules:
- The entire term must be present (do not break up the term)
- Ignore whitespace (multiple whitespace characters)
- Ignore special characters
- The search should be case-insensitive
4. For each match found:\n- Add the cancer term to a list of found terms
- Extract the full sentence containing the cancer term

5. Format your output as a JSON object with two keys: \\'cancer_terms\\' and \\'cancer_sentences\\'. The value for \'cancer_terms\' should be a list of all found cancer terms. The value for \'cancer_sentences\' should be a list of all sentences containing cancer terms.

Here\'s an example of how your output should be formatted:

<answer>
  \"cancer_terms\": [\"Cancer Term 1\", \\"Cancer Term 2\"],\n\"cancer_sentences\": [\n\"This is a sentence containing Cancer Term 1 in it.\",
  \"Here\'s another sentence with Cancer Term 2 present."
  ]
  }
</answer>

If no cancer terms are found in the text, output an empty list for both \'cancer_terms\' and \'cancer_sentences\'.

Provide your answer immediately without any additional commentary.

joshpopelka20 avatar Aug 21 '24 16:08 joshpopelka20

Thanks @joshpopelka20 !

@isaacbmiller nice problem to look into

okhat avatar Aug 22 '24 14:08 okhat

Is there any documentation of the use case? I'm not seeing why this was closed

joshpopelka20 avatar Oct 21 '24 14:10 joshpopelka20