kernel-memory
kernel-memory copied to clipboard
Sharing an improvement: a High Customizable Text Extractor.
Hey Guys!
Below, you will find an attached file that facilitates the overriding of the extraction method during the customization of a new pipeline. Initially developed for personal use, I believe it might be beneficial for you as well. Here is an illustrative example:
var mbuilder = new MemoryClientBuilder();
var memory = mbuilder.Build();
var orchestrator = mbuilder.GetOrchestrator();
// Replacing the default MsWordDecoder
var textExtractor = new TextExtractionHandler("extraction", orchestrator);
textExtractor.AddExtractor(
(pipeline, file, content, ctoken) => {
// return new MsWordDecoder().DocToText(content);
return new MyDecoder().DocToText(content);
},
MimeTypes.MsWord
);
Best Regards, Sandro Bihaiko.