llama_index doc_id missing from source_nodes in GPTQdrantIndex queries

doc_id missing from source_nodes in GPTQdrantIndex queries

Open Mikkolehtimaki opened this issue 2 years ago • 2 comments

If I query against GPTSimpleVectorIndex, the response has source nodes that can be looked at to determine the original document id. Very useful behavior.

If I query against GPTQdrantIndex, source nodes don't have this information.

Looking at the GPTQdrant index creation code, doc_id seems to enter the index as payload though.

Problem is Probably here

@dataclass
class SourceNode(DataClassJsonMixin):
    """Source node.

    User-facing class containing the source text and the corresponding document id.

    """

    source_text: str
    doc_id: Optional[str]
    extra_info: Optional[Dict[str, Any]] = None
    node_info: Optional[Dict[str, Any]] = None

    # distance score between node and query, if applicable
    similarity: Optional[float] = None

    @classmethod
    def from_node(cls, node: Node, similarity: Optional[float] = None) -> "SourceNode":
        """Create a SourceNode from a Node."""
        from IPython import embed; embed()
        return cls(
            source_text=node.get_text(),
            doc_id=node.ref_doc_id,
            extra_info=node.extra_info,
            node_info=node.node_info,
            similarity=similarity,
        )

doc_id looks for ref_doc_id

Why does the behavior differ between indexes?

Feb 15 '23 12:02 Mikkolehtimaki

@kacperlukawski are you able to help with this by any chance? Actually I took a look at GPTQdrantIndexQuery and I think it's a matter of swapping

node = Node(
    doc_id=payload.get("doc_id"),
    text=payload.get("text"),
)

for

node = Node(
      ref_doc_id=payload.get("doc_id"),
      text=payload.get("text"),
 )

Feb 15 '23 21:02 jerryjliu

That seems to be it! I made a small PR :)

Feb 16 '23 06:02 Mikkolehtimaki

thanks @Mikkolehtimaki !!

Feb 17 '23 06:02 jerryjliu

llama_index llama_index copied to clipboard

doc_id missing from source_nodes in GPTQdrantIndex queries

llama_index
llama_index copied to clipboard