tikaondotnet icon indicating copy to clipboard operation
tikaondotnet copied to clipboard

Use the Java Tika text extraction library on the .NET platform

Results 46 tikaondotnet issues
Sort by recently updated
recently updated
newest added

This is a proof of concept TikaOnDotnet build that multitargets .NET 4.6.2 and .NET Core 3.1. There are some major changes to the build process in this fork.... Using [IKVM-Revived](https://github.com/ikvm-revived/ikvm)...

I was trying to run a simple project with the package installed as a nuget. My code was like: TextExtractor textExtractor = new TextExtractor(); var pdfContents = textExtractor.Extract(@"files\sample.pdf"); Console.WriteLine(pdfContents.Text); I...

It looks like post 1.17 that Tika will require Java 8. I believe IKVM v.Current has support for Java 8 but who knows how will this is going to work....

blocked
ready

Tika is crashing on a PDF (which has confidential information, sorry can't post). at line 30 of StreamTextExtractor.cs attempting to extract text from the PDF. ```c# var textExtractor = new...

Hey! I'm trying to extract text from [this](https://github.com/KevM/tikaondotnet/files/6923576/arabic.pdf) file using `tikaondotnet.extraction`. the code is really basic ` public static string Extract(string path) { var te = new TextExtractor(); return te.Extract(path).Text;...

I‘m sorry, this is a question but I didn‘t know where to post otherwise. I get a lot of `ERROR Can't read the embedded Type1C font EX_CFF_TradeGothic_CondEighteenBold WARN Using fallback...

I have a lot pdfs which looked in Acrobat Reader like: "(кроме ипотеки) **в размере: 34 139.33** р. в валюте по ОКВ: 643, в отношении должника (тип должника: физическое лицо):...

Hi! Which version of Tika is included in the packages? Is it time to upgrade? Best Christian

Hello. Is there any chance of supporting .NET Core?

I am searching for text extraction library to extract text from PDF,DOC,DOCX,DWG,TIFF,TIF,DGN,PLT,TXT,XLS,XLSX,HTM type files in .NET. Tikaondotnet is not working for .tiff and .dgn. Is there way to use Tikaondotnet...