extensions icon indicating copy to clipboard operation
extensions copied to clipboard

Using IngestionPipeline for content not originating from the file system

Open f2bo opened this issue 1 month ago • 9 comments

I'm beginning to look at Microsoft.Extensions.DataIngestion pipelines. As a test, I considered using an IngestionPipeline to ingest content stored in a CMS SQL database and create a vector store for use with RAG. However, I'm unclear on how to implement it when the data to be ingested is stored in a database.

Currently, both overloads of the ProcessAsync method require file system objects.

https://github.com/dotnet/extensions/blob/15ffd76a9ed12213f9299c9b94ccf2f86eea1b62/src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs#L80-L81

and

https://github.com/dotnet/extensions/blob/15ffd76a9ed12213f9299c9b94ccf2f86eea1b62/src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs#L107-L108

Perhaps I misunderstand its purpose or how it's meant to be used, but it would appear that it can only ingest data originating from files. Is that the case?

f2bo avatar Nov 24 '25 21:11 f2bo