spark-lucenerdd icon indicating copy to clipboard operation
spark-lucenerdd copied to clipboard

Serialization Issue with org.apache.lucene.facet.FacetsConfig

Open yeikel opened this issue 4 years ago • 4 comments

I am facing the following serialization issue :

Job aborted due to stage failure: Task 144.0 in stage 25.0 (TID 2122) had a not serializable result: org.apache.lucene.facet.FacetsConfig
Serialization stack:
	- object not serializable (class: org.apache.lucene.facet.FacetsConfig, value: org.apache.lucene.facet.FacetsConfig@53a75ca4)
	- field (class: org.zouzias.spark.lucenerdd.partition.LuceneRDDPartition, name: FacetsConfig, type: class org.apache.lucene.facet.FacetsConfig)
	- object (class org.zouzias.spark.lucenerdd.partition.LuceneRDDPartition, org.zouzias.spark.lucenerdd.partition.LuceneRDDPartition@30e83579)

It is hard to replicate and I am not sure what is triggering it as index works just fine sometimes.

Do you have any idea? @zouzias

Example of jobs. Some failed , some succeed(same code) :

image

yeikel avatar Feb 25 '20 15:02 yeikel

Hi,

do you use faceted search at all? I would like to remove the faceted search feature since DataFrames with parquet files as a backend are superior.

See: https://github.com/zouzias/spark-lucenerdd/pull/171

zouzias avatar Feb 26 '20 10:02 zouzias

I am not using that feature but indexing seems to be calling the FacetsConfig

yeikel avatar Feb 26 '20 14:02 yeikel

This looks very suspicious. Can you share some code to help you reproduce the error?

	- object (class org.zouzias.spark.lucenerdd.partition.LuceneRDDPartition, org.zouzias.spark.lucenerdd.partition.LuceneRDDPartition@30e83579)

It seems that the LuceneRDDPartition object is being serialized which it should never happen. Are you using the cartesianlinker method?

zouzias avatar Feb 26 '20 17:02 zouzias

Yes , I am using the cartesianlinker

I am not sure what triggers it , but I will update the issue if I can find a repeatable sample

yeikel avatar Feb 27 '20 02:02 yeikel