kernel-memory icon indicating copy to clipboard operation
kernel-memory copied to clipboard

Sharing an improvement: a High Customizable Text Extractor.

Open sbihaiko opened this issue 9 months ago • 0 comments

Hey Guys!

Below, you will find an attached file that facilitates the overriding of the extraction method during the customization of a new pipeline. Initially developed for personal use, I believe it might be beneficial for you as well. Here is an illustrative example:

var mbuilder = new MemoryClientBuilder();
var memory = mbuilder.Build();
var orchestrator = mbuilder.GetOrchestrator();

// Replacing the default MsWordDecoder
var textExtractor = new TextExtractionHandler("extraction", orchestrator);
textExtractor.AddExtractor(
    (pipeline, file, content, ctoken) => { 
        // return new MsWordDecoder().DocToText(content); 
        return new MyDecoder().DocToText(content);  
    },
    MimeTypes.MsWord
);

Best Regards, Sandro Bihaiko.

TextExtractionHandler.cs.txt

sbihaiko avatar Sep 17 '23 22:09 sbihaiko