spark-lucenerdd icon indicating copy to clipboard operation
spark-lucenerdd copied to clipboard

Typesafe config is generating the error UTFDataFormatException: encoded string too long

Open yeikel opened this issue 4 years ago • 2 comments

I noticed that we are using Typesafe config and that seems to be introducing serialization issues to the job as they are failing with the following exception :

Caused by: java.io.UTFDataFormatException: encoded string too long: 72887 bytes

The issue is hard to replicate and all I can provide at the moment are the stack traces. I will update the issue if I find a way to replicate it

Do you have any recommendation to deal with this issue?

Similar issue : https://stackoverflow.com/questions/41505599/task-not-serializable-in-spark-caused-by-utfdataformatexception-encoded-string

yeikel avatar Feb 25 '20 05:02 yeikel

This looks like a weird issue.

AFAIR, the typesafe configs for LuceneRDD do not need to be serializable. If you use the typesafe config in your application make sure you use it within an object so that it is available to both driver and executors.

You can extend this trait https://github.com/zouzias/spark-lucenerdd/blob/master/src/main/scala/org/zouzias/spark/lucenerdd/config/Configurable.scala#L24 to get it working.

zouzias avatar Feb 26 '20 10:02 zouzias

It really does.

I am not using typesafe configs on my own application. The exception is coming from LuceneRDD itself.

I did another build removing all the references to it in LuceneRDD and it is working fine. I obviously miss the capability to add dynamic configurations so that's not a good solution.

yeikel avatar Feb 26 '20 14:02 yeikel