feat: add multi-document answers
Adds number of documents as an optional argument. This is a very important feature for databases, that embed highly fractured, (short) documents (like my QnA).
The number 1 is used as a default so it's fully backwards compatible.
For this to work, context is now a list instead of a string.
Open questions:
- is
|the right delimiter? - I'm honestly not sure how exactly chroma does this. It's obviously using chunks. But could all chunks be from the same document? Then we might rename it from
number_documentstonumber_chunks. This is just a naming/correctness issue, because we can't change anything about it anyways.
Followup
- We could add a max distance setting. That way you don't query extra documents that are super far away, only costing you more tokens for a worse result
- I think it would be cool to somehow process the distance as a weight through language in the prompt. Right now all documents are treated equally (unless the LLM prioritizes what comes first). Adding weights does have negative implications though: it adds complexity and you could say that the LLM is better at determining what's important than the embedding AI.
Great PR! I left some comments
Hey, thanks for taking your time and the one fix. I will remove the type check for list, since it's not required anymore.
I'm not sure about the typing you add. I'm a big fan, I like strong typing a lot, but it just feels weird to have typing for one function and no where else in the code. There should be a full refactor that adds typing. What do you think?
@candidosales I get a linting error for the typing (I'm using Python 3.8.10). And it's also throwing an error when I run it.
def generate_prompt(self, input_query: str, contexts: list[str]):
TypeError: 'type' object is not subscriptable
I did some research. The problem is that we require python >=3.8. According to my linter and my research, your suggestion is only for Python >=3.9, so it's not fully compatible with Python 3.8 anymore.
I suggest we bump the required version to >=3.9. Then we can use the current (simple) typing, that you suggested. But that's something that should not be decided in this PR. I will remove the typing changes. @taranjeet please note.
The use of
listorListfromtypingin Python depends on the version you're using. Here's a general guideline:
- Python 3.9 and later: With this version, you can use the built-in list type directly for type hinting, such as
list[int], thanks to PEP 585. This PEP introduced several changes to Python's typing system, including the ability to use built-in collection types (like list and dict) as generic types.def my_function(my_list: list[int]) -> int: return sum(my_list)
- Python 3.5 to 3.8: In these versions, you should use
Listfrom thetypingmodule for type hinting.from typing import List def my_function(my_list: List[int]) -> int: return sum(my_list)
- Python 3.7 or 3.8 (with future import): PEP 563 (Postponed Evaluation of Type Annotations) allows for more forward-compatible handling of type hints. If you're using Python 3.7 or 3.8, you can use a future import to use the built-in
listanddictas generic types, similar to Python 3.9 and later.from __future__ import annotations def my_function(my_list: list[int]) -> int: return sum(my_list)
- Python 3.5 and earlier: These versions of Python don't have built-in support for type hints in the same way. While you can still use the
typingmodule for certain things, the typing system is less developed and doesn't support the same kind of generic types.Remember, type hints are entirely optional in Python and do not affect how your code runs - they're just a tool to help with development. Your code should still run correctly even without type hints.
@cachho I'm using Python 3.10.11. In my opinion, it would be beneficial to incorporate infer types in all methods. This will enhance the overall developer experience and facilitate maintenance in the long run. Perhaps we could propose implementing these changes in a new PR.
What do you think?
In my opinion, it would be beneficial to incorporate infer types in all methods
I agree, once again the only problem with it in this PR is that we lose compatibility with 3.8. We need to decide if that's worth it, and that's out of scope for this PR.
Perhaps we could propose implementing these changes in a new PR.
I agree, but I would talk to @taranjeet on discord first, before you go all in and then we decide to keep 3.8 compatibility. That's the version I use btw, and I don't think I'm the only one.
resolved the merge conflicts.
adjusted to new config
example:
import os
from embedchain import App
from embedchain.config import AddConfig, QueryConfig
naval_chat_bot = App()
add_config = AddConfig() # Currently no options
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44", add_config)
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf", add_config)
naval_chat_bot.add("web_page", "https://nav.al/feedback", add_config)
naval_chat_bot.add("web_page", "https://nav.al/agi", add_config)
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."), add_config)
query_config = QueryConfig(number_documents=1)
print(naval_chat_bot.dry_run("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))
query_config = QueryConfig(number_documents=5)
print(naval_chat_bot.dry_run("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))
returns
Use the following context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: alien species that also had the power to generate these good explanations, there is no explanation that they could generate that we could not understand. We are maximally capable of understanding. There is no concept out there that is possible in this physical reality that a human being, given sufficient time and resources and education, could not understand. Subscribe to Naval Related Modal body text goes here. Close Query: What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts? Helpful Answer:
versus
Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: alien species that also had the power to generate these good explanations, there is no explanation that they could generate that we could not understand. We are maximally capable of understanding. There is no concept out there that is possible in this physical reality that a human being, given sufficient time and resources and education, could not understand. Subscribe to Naval Related Modal body text goes here. Close | explanation. It’s parroting. It’s brilliant Bayesian reasoning. It’s extrapolating from what it already sees out there generated by humans on the web, but it doesn’t have an underlying model of reality that can explain the seen in terms of the unseen. And I think that’s critical. That is what humans do uniquely that no other creature, no other computer, no other intelligence—biological or artificial—that we have ever encountered does. And not only do we do it uniquely, but if we were to meet an | people find their way to Naval’ s wisdom. | 96 · THE ALMANACK OF NAVAL RAVIKANTThe really smart thinkers are clear thinkers. They understand the basics at a very, very fundamental level. I would rather understand the basics really well than memorize all kinds of complicated concepts I can’t stitch together and can’t rederive from the basics. If you can’t rederive concepts from the basics as you need them, you’re lost. You’re just memorizing. [4] The advanced concepts in a field are less proven. We use them to signal insider knowledge, but we’d be better off nailing the basics. [11] Clear thinkers appeal to their own authority. Part of making effective decisions boils down to dealing with reality. How do you make sure you’re dealing with reality when you’re making decisions? By not having a strong sense of self or judgments or mind presence. The “monkey mind” will always respond with this regurgitated emotional response to what it thinks the world should be. Those desires will cloud your reality. This happens a lot of times when | knowledge, capability, and desire nobody else in the world does, purely from the combinatorics of human DNA and development. The combinatorics of human DNA and experience are staggering. You will never meet any two humans who are substitutable for each other. Query: What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts? Helpful Answer:
@taranjeet changed readme text to number of documents to be retrieved as context as asked for in #163
du to the custom prompt, the idea of changing the prompt based on plural and singular numbers was ditched. I think that's okay.