Kevin Miller

Results 81 comments of Kevin Miller

Thanks for creating this issue and looking into exposing this potentially useful feature of Tika and Tesseract.

Would you like to submit a PR with this and I can work with you to get this capability into the text extractor?

I'd like to discuss this feature addition a bit. @Sicos1997 was nice enough to roll this feature into PR #72 creating a separate `ITextExtractor` implementation which works with Tesseract to...

Thanks, it is useful to see how you got it working.

Sorry you are having problems. That part of Tika (Office document extraction) is controlled by [POI](https://poi.apache.org/). I'd take a look over there to see if they support the desired capability.

Sorry I am not familiar with XHTML Extractor. If it is packaged with Tika likely you can do it.

@njss Did you figure this out?

Thanks for checking us out. Take a look at the Developer guide. It should help you get going. https://github.com/KevM/tikaondotnet/blob/master/Developers.md

Hmm the guide should have helped you through the `SolutionInfo.cs` problem. ☹️

This project is not so much dead as limited by the unsupported IKVM.