llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

[Question]: 新索引的使用

Open 827648313 opened this issue 11 months ago • 9 comments

Question Validation

  • [X] I have searched both the documentation and discord for an answer.

Question

有没有关键词索引 摘要索引的简单示例

827648313 avatar Mar 22 '24 10:03 827648313

嘿,@827648313!很高兴再次见到你,希望一切都好。

是的,LlamaIndex支持关键词索引。提供的代码定义了一个BaseKeywordTableIndex类,该类从文本中提取关键词,并将每个关键词映射到相应的节点。这个过程模仿了一个"哈希表"机制,在索引构建过程中,通过从每个节点提取关键词并创建内部映射来构建关键词表。在查询时,从查询文本中提取关键词,然后使用这些关键词在关键词表中进行索引,以检索回答查询的节点。这个功能在BaseKeywordTableIndex类中的_extract_keywords_add_nodes_to_index,和_build_index_from_nodes方法中很明显,以及在KeywordTableIndex类中重写的_extract_keywords方法,该方法特别使用了GPT模型来提取关键词。

源代码:

对于摘要索引,LlamaIndex目前可能还不支持。如果您需要这个功能,我建议您在LlamaIndex的GitHub仓库中提交一个新的issue,请求添加这个功能。

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Mar 22 '24 10:03 dosubot[bot]

“关键词”的话,试试我写的这个?

https://github.com/tslmy/agent/blob/main/sub_question_generating_prompt_in_keywords.py

tslmy avatar Mar 23 '24 22:03 tslmy

“关键词”的话,试试我写的这个?

https://github.com/tslmy/agent/blob/main/sub_question_generating_prompt_in_keywords.p 您这个我怎么感觉是构建prompt 呀,没有构建索引,以及查询的流程呢?

827648313 avatar Mar 25 '24 01:03 827648313

嘿,@827648313!很高兴再次见到你,希望一切都好。

是的,LlamaIndex支持关键词索引。提供的代码定义了一个BaseKeywordTableIndex类,该类从文本中提取关键词,并将每个关键词映射到相应的节点。这个过程模仿了一个"哈希表"机制,在索引构建过程中,通过从每个节点提取关键词并创建内部映射来构建关键词表。在查询时,从查询文本中提取关键词,然后使用这些关键词在关键词表中进行索引,以检索回答查询的节点。这个功能在BaseKeywordTableIndex类中的_extract_keywords_add_nodes_to_index,和_build_index_from_nodes方法中很明显,以及在KeywordTableIndex类中重写的_extract_keywords方法,该方法特别使用了GPT模型来提取关键词。

源代码:

对于摘要索引,LlamaIndex目前可能还不支持。如果您需要这个功能,我建议您在LlamaIndex的GitHub仓库中提交一个新的issue,请求添加这个功能。

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

我看llamaIndex的官方文档是有这个索引的 ,难道是只有简单的描述? 没有实际功能,除此之外,我看还有一个文档摘要,这个有实际的功能么?https://docs.llamaindex.ai/en/stable/api_reference/indices/summary/

827648313 avatar Mar 25 '24 02:03 827648313

Yes, LlamaIndex does support summary indexing. The DocumentSummaryIndex object is a key component in this functionality. It is designed to store and manage summaries of documents, extending the BaseIndex class and specifically tailored to handle IndexDocumentSummary objects.

The DocumentSummaryIndex object has several functionalities:

  1. Summary Generation and Storage: It generates summaries for documents using a response synthesizer and stores these summaries. The summaries can be generated based on a predefined query (summary_query), which by default is set to generate a description of what the provided text is about and some questions the text can answer.

  2. Embedding Summaries: It can embed the summaries if embed_summaries is set to True. This is essential for running the default embedding-based retriever, indicating that the summaries are not just stored as text but can also be converted into a vectorized form for more efficient retrieval based on semantic similarity.

  3. Retrieval: The as_retriever method allows the DocumentSummaryIndex to function as a retriever. This method can operate in different modes, as indicated by the retriever_mode parameter, which supports at least an "embedding" mode. This implies that the index can retrieve document summaries based on their embedded vector representations, facilitating semantic search capabilities.

  4. Integration with Service and Storage Contexts: The constructor accepts service_context and storage_context parameters, indicating that the DocumentSummaryIndex is designed to work within a broader service architecture, possibly interacting with other components of the LlamaIndex system for services like response synthesis and data storage.

  5. Progress Tracking: It supports progress tracking through the show_progress parameter, which, when set to True, displays progress bars likely during the summary generation or embedding process, enhancing the user experience during long-running operations.

  6. Customization and Extensibility: The class is designed with extensibility in mind, accepting additional keyword arguments (**kwargs) and allowing for optional customization of nodes, objects, and the index structure itself through its constructor parameters.

You can find more details about this in the DocumentSummaryIndex source code.

In addition to the DocumentSummaryIndex, there are also classes DocumentSummaryIndexLLMRetriever and DocumentSummaryIndexEmbeddingRetriever designed to retrieve document summaries from an index. The DocumentSummaryIndexLLMRetriever class selects relevant document summaries from an index using Language Model (LLM) calls, while the DocumentSummaryIndexEmbeddingRetriever class retrieves document summaries based on embedding similarity. You can find more details about these in the retrievers source code.

I hope this information helps! If you have any more questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Mar 25 '24 02:03 dosubot[bot]

@dosu-bot 请对 @827648313 讲中文。

tslmy avatar Mar 25 '24 06:03 tslmy

@827648313 现在我明白您的意思了。不好意思,我其实不太清楚这些术语在汉语里怎么说。

您试过在GitHub里搜索“llama_index SummaryIndex”了吗?

image

我这样搜索后,第一页就有这样的条目:

image

点开之后,我再在网页里搜索“SummaryIndex”字样,立刻就找到了它是怎么被使用的:

https://github.com/pchunduri6/rag-demystified/blob/e7b38d89ed5671675a9299a697e413483b75cfd6/llama_index_baseline.py#L149-L151

GitHub的搜索功能近年来有极高的增强,非常建议一试。

tslmy avatar Mar 25 '24 06:03 tslmy

llama_index SummaryIndex 好的 感谢感谢

827648313 avatar Mar 27 '24 02:03 827648313

@827648313 没啥追加问题的话可以先把这个issue关了,反正他们大部分人也读不懂中文 放这儿他们很多人都不知道该拿这个issue怎么办(我自己只是个志愿者)

tslmy avatar Mar 27 '24 04:03 tslmy