henkei
henkei copied to clipboard
Suppress INFO messages?
Hi there,
First of all: Thank you for forking Yomu and bringing it back alive. Absolutely amazing work.
Second: Any idea on how I might suppress INFO messages from showing up? These occur when I'm parsing a PDF document. My Rails logger is set to warning, but I'm guessing these show because they're coming directly from Apache Tika.
INFO To get higher rendering speed on JDK8 or later,
INFO use the option -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
INFO or call System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider")
Cheers, Finn
Yeah, unfortunately that's coming from PDFBox - as used by Tika (due to a change in Java 8 where the default is to use LittleCMS instead of KCMS). According to their own documentation:
KCMS is the unmaintained, legacy provider and is far faster than the newer replacement.
However, there are stability and security risks with using the unmaintained legacy provider.
So why they feel it necessary to spout all of that 'information' about it is beyond me.
The info itself is coming from: https://github.com/apache/pdfbox/blob/f83bcc1fe60502759024a3b51983b29c7de66327/pdfbox/src/main/java/org/apache/pdfbox/rendering/PDFRenderer.java#L394
I did look into it a while back, and as far as I could tell there wasn't really a nice way to suppress this info (and not end up suppressing ALL info). I've just been putting up with it.
If you feel the need, you can overload the config for the pdfbox logger and pipe it to somewhere else.
Thanks for the fast reply! Much appreciated!
You having used Apache Tika much longer than I have, do you think I would be losing anything of importance if I decided to filter out all 'INFO' messages by filtering the return of io.read
?
I would imagine any issues of crucial concern would have an ERROR
or WARNING
status. It could even become a Henkei setting, e.g. Henkei.log_info = true/false
.
Hmm that sounds a bit dangerous (ie you could filter out non-info things you didn't mean to). I would think the more reliable solution would be to overload the config for the pdfbox logger to simply change the logger level.
Hmmm, fair enough.
I'm pretty unfamiliar with Java, that's why I tried to avoid having to touch the pdfbox logger config :sweat_smile: Is that something I would do in jar/tika-config.xml
?
It's been a while since I've looked at Java.
The pdfbox library uses the Apache Commons Logging library so I think that'd be the place to start: https://commons.apache.org/proper/commons-logging/guide.html#Quick_Start
It appears to be more of a wrapper for other logging systems and I have no idea which one that actually would be. It seems like it depends on what you have installed