llama_index
llama_index copied to clipboard
[Question]: 新索引的使用
Question Validation
- [X] I have searched both the documentation and discord for an answer.
Question
有没有关键词索引 摘要索引的简单示例
嘿,@827648313!很高兴再次见到你,希望一切都好。
是的,LlamaIndex支持关键词索引。提供的代码定义了一个BaseKeywordTableIndex
类,该类从文本中提取关键词,并将每个关键词映射到相应的节点。这个过程模仿了一个"哈希表"机制,在索引构建过程中,通过从每个节点提取关键词并创建内部映射来构建关键词表。在查询时,从查询文本中提取关键词,然后使用这些关键词在关键词表中进行索引,以检索回答查询的节点。这个功能在BaseKeywordTableIndex
类中的_extract_keywords
,_add_nodes_to_index
,和_build_index_from_nodes
方法中很明显,以及在KeywordTableIndex
类中重写的_extract_keywords
方法,该方法特别使用了GPT模型来提取关键词。
源代码:
对于摘要索引,LlamaIndex目前可能还不支持。如果您需要这个功能,我建议您在LlamaIndex的GitHub仓库中提交一个新的issue,请求添加这个功能。
Sources
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
“关键词”的话,试试我写的这个?
https://github.com/tslmy/agent/blob/main/sub_question_generating_prompt_in_keywords.py
“关键词”的话,试试我写的这个?
https://github.com/tslmy/agent/blob/main/sub_question_generating_prompt_in_keywords.p 您这个我怎么感觉是构建prompt 呀,没有构建索引,以及查询的流程呢?
嘿,@827648313!很高兴再次见到你,希望一切都好。
是的,LlamaIndex支持关键词索引。提供的代码定义了一个
BaseKeywordTableIndex
类,该类从文本中提取关键词,并将每个关键词映射到相应的节点。这个过程模仿了一个"哈希表"机制,在索引构建过程中,通过从每个节点提取关键词并创建内部映射来构建关键词表。在查询时,从查询文本中提取关键词,然后使用这些关键词在关键词表中进行索引,以检索回答查询的节点。这个功能在BaseKeywordTableIndex
类中的_extract_keywords
,_add_nodes_to_index
,和_build_index_from_nodes
方法中很明显,以及在KeywordTableIndex
类中重写的_extract_keywords
方法,该方法特别使用了GPT模型来提取关键词。源代码:
对于摘要索引,LlamaIndex目前可能还不支持。如果您需要这个功能,我建议您在LlamaIndex的GitHub仓库中提交一个新的issue,请求添加这个功能。
Sources
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
我看llamaIndex的官方文档是有这个索引的 ,难道是只有简单的描述? 没有实际功能,除此之外,我看还有一个文档摘要,这个有实际的功能么?https://docs.llamaindex.ai/en/stable/api_reference/indices/summary/
Yes, LlamaIndex does support summary indexing. The DocumentSummaryIndex
object is a key component in this functionality. It is designed to store and manage summaries of documents, extending the BaseIndex
class and specifically tailored to handle IndexDocumentSummary
objects.
The DocumentSummaryIndex
object has several functionalities:
-
Summary Generation and Storage: It generates summaries for documents using a response synthesizer and stores these summaries. The summaries can be generated based on a predefined query (
summary_query
), which by default is set to generate a description of what the provided text is about and some questions the text can answer. -
Embedding Summaries: It can embed the summaries if
embed_summaries
is set toTrue
. This is essential for running the default embedding-based retriever, indicating that the summaries are not just stored as text but can also be converted into a vectorized form for more efficient retrieval based on semantic similarity. -
Retrieval: The
as_retriever
method allows theDocumentSummaryIndex
to function as a retriever. This method can operate in different modes, as indicated by theretriever_mode
parameter, which supports at least an "embedding" mode. This implies that the index can retrieve document summaries based on their embedded vector representations, facilitating semantic search capabilities. -
Integration with Service and Storage Contexts: The constructor accepts
service_context
andstorage_context
parameters, indicating that theDocumentSummaryIndex
is designed to work within a broader service architecture, possibly interacting with other components of the LlamaIndex system for services like response synthesis and data storage. -
Progress Tracking: It supports progress tracking through the
show_progress
parameter, which, when set toTrue
, displays progress bars likely during the summary generation or embedding process, enhancing the user experience during long-running operations. -
Customization and Extensibility: The class is designed with extensibility in mind, accepting additional keyword arguments (
**kwargs
) and allowing for optional customization of nodes, objects, and the index structure itself through its constructor parameters.
You can find more details about this in the DocumentSummaryIndex source code.
In addition to the DocumentSummaryIndex
, there are also classes DocumentSummaryIndexLLMRetriever
and DocumentSummaryIndexEmbeddingRetriever
designed to retrieve document summaries from an index. The DocumentSummaryIndexLLMRetriever
class selects relevant document summaries from an index using Language Model (LLM) calls, while the DocumentSummaryIndexEmbeddingRetriever
class retrieves document summaries based on embedding similarity. You can find more details about these in the retrievers source code.
I hope this information helps! If you have any more questions or need further clarification, feel free to ask.
Sources
- llama-index-core/llama_index/core/indices/document_summary/retrievers.py
- llama-index-legacy/llama_index/legacy/indices/document_summary/retrievers.py
- llama-index-legacy/llama_index/legacy/indices/document_summary/base.py
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot 请对 @827648313 讲中文。
@827648313 现在我明白您的意思了。不好意思,我其实不太清楚这些术语在汉语里怎么说。
您试过在GitHub里搜索“llama_index SummaryIndex”了吗?
我这样搜索后,第一页就有这样的条目:
点开之后,我再在网页里搜索“SummaryIndex”字样,立刻就找到了它是怎么被使用的:
https://github.com/pchunduri6/rag-demystified/blob/e7b38d89ed5671675a9299a697e413483b75cfd6/llama_index_baseline.py#L149-L151
GitHub的搜索功能近年来有极高的增强,非常建议一试。
llama_index SummaryIndex 好的 感谢感谢
@827648313 没啥追加问题的话可以先把这个issue关了,反正他们大部分人也读不懂中文 放这儿他们很多人都不知道该拿这个issue怎么办(我自己只是个志愿者)