LlamaIndexTS icon indicating copy to clipboard operation
LlamaIndexTS copied to clipboard

Aligning LlamaIndex Metadata Structure with Underlying Database Capabilities to Support Arrays of Objects

Open TYRONEMICHAEL opened this issue 11 months ago • 1 comments

Description:

Issue Summary:

We are utilizing LlamaIndex as an interface for various vector database implementations, including ChromaDb. While ChromaDb supports a flexible metadata structure that allows for arrays of objects, enabling rich and complex metadata associations, we've identified a limitation within LlamaIndex's metadata handling. The current Record<string, any> type definition for metadata in LlamaIndex restricts us to a flat key-value pair structure, which does not fully leverage the underlying databases' capabilities, particularly ChromaDb's ability to handle arrays of objects within metadata.

ChromaDb's Metadata Capabilities:

ChromaDb allows for a diverse range of metadata structures, as demonstrated by the following usage pattern:

await collection.upsert({
  ids: ["id1", "id2", "id3"],
  embeddings: [[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2]],
  metadatas: [
    { "chapter": "3", "verse": "16" },
    { "chapter": "3", "verse": "5" },
    { "chapter": "29", "verse": "11" }
  ],
  documents: ["doc1", "doc2", "doc3"]
});

This flexibility in metadata structure allows users to associate multiple related attributes with a single document, enhancing the expressiveness and utility of the metadata.

Proposed Enhancement for LlamaIndex:

To bridge this gap and align LlamaIndex more closely with the capabilities of ChromaDb and potentially other databases, I propose we consider extending the metadata type definition in LlamaIndex to Record<string, any>[]. This adjustment would permit an array of metadata objects, each maintaining a flat structure, thereby respecting the underlying databases' constraints while offering enhanced flexibility and expressiveness in metadata definition.

Potential Benefits:

  • Enhanced Metadata Expressiveness: Allows for more complex and nuanced metadata associations, akin to what is already possible in ChromaDb.
  • Increased Flexibility and Usability: Makes LlamaIndex more adaptable for a variety of use cases where complex metadata is essential.
  • Alignment with Underlying Databases: Ensures that LlamaIndex can fully leverage the features and capabilities of the databases it interfaces with, like ChromaDb.

Seeking Input and Suggestions:

I am keen to hear the community's thoughts on this proposal, any potential challenges it might pose, and how it might be implemented effectively. Suggestions for alternative approaches that could resolve the issue are also highly welcome.

TYRONEMICHAEL avatar Mar 21 '24 14:03 TYRONEMICHAEL