David S. Batista

Results 94 comments of David S. Batista

I have a new proposal regarding the MetaDataBuilder/DocumentBuilder. We can merge both this issue and https://github.com/deepset-ai/haystack/issues/5702 into a single issue, with the goal to create a MetadataBuilder, whose purpose is...

That's a very good point - maybe there's a better way to handle this, but my idea was to have a component that uses any other component to extract metadata...

> I don't see any reason to nest the extractor component inside the metadata builder? Doesn't need to be nested, I'm just after a way to leverage on existing components...

> `a component that takes the output of an LLM and returns the output of the LLM as the content of the document` NOTE: the content of the document is...

I think in many cases you don't need LLM to do metadata extraction, and rather want to use other components from Haystack, like the NER module, or a custom component...

> Ok, I’m just a bit confused why we decided to drop the use case in this issue (originally it’s about adding LLM output to document content) and only do...

@sjrl would this suit your needs? A component that given documents and new metadata updates the metadata of the documents? ```python @component class MetadataBuilder: """ A component that allows updating...

> Hey @davidsbatista I think yes, just to clarify the type of metadata should be List[Dict[str, Any]] right? sorry, yes, my mistake - it should be only have one `List[]`...

> just make our own custom component that can fully handle the use case I'm happy to do that, and I can adapt some of the ideas from the [MetaDataExtractor](https://www.notion.so/deepsetai/Advanced-Use-Case-Automatic-Metadata-Enrichment-8fdfc56e82434459963beaa7a9dc5069?pm=c)...

@sjrl I think if I had the component: ```python @component class MetadataBuilder: """ A component that allows updating a Documents metadata. """ @component.output_types(documents=List[Document]) def run(self, metadata: List[Dict[str, Any]], documents: List[Document]):...