sparksql-scalapb icon indicating copy to clipboard operation
sparksql-scalapb copied to clipboard

Protos with recursive fields fail with stack overflow

Open drewrobb opened this issue 8 years ago • 9 comments

Adding a recursive field to a proto breaks things, see https://github.com/drewrobb/sparksql-scalapb-test/commit/4cfc436c5a3a9f75d4218a0695ff7e9c2b8300e3 for a reproduction. I'm happy to help address this if you have a recommended approach to solving it?

Exception in thread "main" java.lang.StackOverflowError
	at shadeproto.Descriptors$FieldDescriptor.getName(Descriptors.java:881)
	at com.trueaccord.scalapb.spark.ProtoSQL$.com$trueaccord$scalapb$spark$ProtoSQL$$structFieldFor(ProtoSQL.scala:65)
	at com.trueaccord.scalapb.spark.ProtoSQL$$anonfun$1.apply(ProtoSQL.scala:62)
	at com.trueaccord.scalapb.spark.ProtoSQL$$anonfun$1.apply(ProtoSQL.scala:62)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
 

      ......

	at com.trueaccord.scalapb.spark.ProtoSQL$.com$trueaccord$scalapb$spark$ProtoSQL$$structFieldFor(ProtoSQL.scala:62)
	at com.trueaccord.scalapb.spark.ProtoSQL$$anonfun$1.apply(ProtoSQL.scala:62)
	at com.trueaccord.scalapb.spark.ProtoSQL$$anonfun$1.apply(ProtoSQL.scala:62)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)

drewrobb avatar Dec 03 '16 01:12 drewrobb

@drewrobb, were you able to find a resolution for this? We are facing what looks like a similar issue with a highly nested schema

dbkegley avatar Mar 20 '17 23:03 dbkegley

@dbkegley we have not found a resolution to this, nor even have a proposed way to fix it

drewrobb avatar Mar 20 '17 23:03 drewrobb

Schemas in Spark must be known ahead of time. A possible workaround would be to set a limit on the recursion depth when generating a schema. Would that be useful?

thesamet avatar Mar 21 '17 04:03 thesamet

That sounds like it would fix my use case. We aren't storing arbitrarily deep trees or anything-- mostly just single level recursion like in the example in this issue.

drewrobb avatar Mar 21 '17 04:03 drewrobb

FWIW, for a single level, you could do something like this:

message Person { ... }

message PersonWithOtherPerson {
  optional Person main = 1;
  optional Person other_person = 2;
}

The downside is that this pushes the parent Person to a field, rather than in the top level. One way to get around this is to have an implicit conversion between PersonWithOtherPerson and Person.

thesamet avatar Mar 21 '17 05:03 thesamet

@thesamet I think this would work for us as well. Unfortunately we only consume so don't have access to update the schema. We can advise against recursive fields but there's no guarantee the producers will follow our recommendation

dbkegley avatar Mar 21 '17 18:03 dbkegley

@thesamet, I'm in the same boat where I, as a consumer, cannot control the source. It would be great if the ProtoSQL driver could have a recursion-depth limit/parameter. As a workaround, I'm looking into a way to flatten this out before it hits Spark. The recursion is maximum 10 deep if that helps.

I'm using Scala 2.11.12, Spark 2.4.4, sparksql-scalapb 0.9.2, sbt-protoc 0.99.28, scalapb compilerplugin 0.9.7.

colinlouie avatar May 05 '20 17:05 colinlouie

just wandering if this issue was addressed in the newer release of scalapb? We are facing a similar issue.

anjshrg avatar Aug 26 '21 17:08 anjshrg

Hi @anjshrg , the issue is still not resolved. PRs will be welcome!

thesamet avatar Aug 26 '21 18:08 thesamet