dspy
dspy copied to clipboard
Dspy for multineedle in a haystack
I'm trying to use llama 3.1 70B to do "multi-needle in a haystack" search. Basically, I'm asking the model to use a text and search through a list of terms; finding all terms in the given text. My current prompt isn't working and I didn't see any examples for "multi-needle in a haystack". Has something like this or something similar been tried with dspy?
I've done instruction fine-tuning with the 8B model, but it didn't work. Next step is the 70B model, but I'm starting to think I need some prompt engineering to figure it out. This project seems like a good start.
Any help would be much appreciated!
Fyi, this is my current prompt:
You will be searching for cancer terms in a given text and providing context for any matches found. Follow these instructions carefully:
1. Here is the list of cancer terms to search for:
<cancer_terms>
{context_str}
</cancer_terms>
2. Here is the text to search through:
<text_to_search>
{query_str}
</text_to_search>
3. Search through the text, looking for exact matches of the cancer terms provided. Follow these rules:
- The entire term must be present (do not break up the term)
- Ignore whitespace (multiple whitespace characters)
- Ignore special characters
- The search should be case-insensitive
4. For each match found:\n- Add the cancer term to a list of found terms
- Extract the full sentence containing the cancer term
5. Format your output as a JSON object with two keys: \\'cancer_terms\\' and \\'cancer_sentences\\'. The value for \'cancer_terms\' should be a list of all found cancer terms. The value for \'cancer_sentences\' should be a list of all sentences containing cancer terms.
Here\'s an example of how your output should be formatted:
<answer>
\"cancer_terms\": [\"Cancer Term 1\", \\"Cancer Term 2\"],\n\"cancer_sentences\": [\n\"This is a sentence containing Cancer Term 1 in it.\",
\"Here\'s another sentence with Cancer Term 2 present."
]
}
</answer>
If no cancer terms are found in the text, output an empty list for both \'cancer_terms\' and \'cancer_sentences\'.
Provide your answer immediately without any additional commentary.
Thanks @joshpopelka20 !
@isaacbmiller nice problem to look into
Is there any documentation of the use case? I'm not seeing why this was closed