tikaondotnet
tikaondotnet copied to clipboard
Use the Java Tika text extraction library on the .NET platform
This is a proof of concept TikaOnDotnet build that multitargets .NET 4.6.2 and .NET Core 3.1. There are some major changes to the build process in this fork.... Using [IKVM-Revived](https://github.com/ikvm-revived/ikvm)...
I was trying to run a simple project with the package installed as a nuget. My code was like: TextExtractor textExtractor = new TextExtractor(); var pdfContents = textExtractor.Extract(@"files\sample.pdf"); Console.WriteLine(pdfContents.Text); I...
It looks like post 1.17 that Tika will require Java 8. I believe IKVM v.Current has support for Java 8 but who knows how will this is going to work....
Tika is crashing on a PDF (which has confidential information, sorry can't post). at line 30 of StreamTextExtractor.cs attempting to extract text from the PDF. ```c# var textExtractor = new...
Hey! I'm trying to extract text from [this](https://github.com/KevM/tikaondotnet/files/6923576/arabic.pdf) file using `tikaondotnet.extraction`. the code is really basic ` public static string Extract(string path) { var te = new TextExtractor(); return te.Extract(path).Text;...
I‘m sorry, this is a question but I didn‘t know where to post otherwise. I get a lot of `ERROR Can't read the embedded Type1C font EX_CFF_TradeGothic_CondEighteenBold WARN Using fallback...
I have a lot pdfs which looked in Acrobat Reader like: "(кроме ипотеки) **в размере: 34 139.33** р. в валюте по ОКВ: 643, в отношении должника (тип должника: физическое лицо):...
Hi! Which version of Tika is included in the packages? Is it time to upgrade? Best Christian
Hello. Is there any chance of supporting .NET Core?
I am searching for text extraction library to extract text from PDF,DOC,DOCX,DWG,TIFF,TIF,DGN,PLT,TXT,XLS,XLSX,HTM type files in .NET. Tikaondotnet is not working for .tiff and .dgn. Is there way to use Tikaondotnet...