frameless icon indicating copy to clipboard operation
frameless copied to clipboard

CodeGen fails when case class fields are reserved java keywords

Open imarios opened this issue 7 years ago • 5 comments

It is possible to define a case class with reserve field names using back-ticks.

case class Foo(a: String, `if`: Int)
val t = TypedDataset.create(Seq(Foo("a",2), Foo("b",2)))

Fails with the following error:

17/06/01 00:45:54 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 49, Column 44: Unexpected selector 'if' after "."
/* 001 */ public java.lang.Object generate(Object[] references) {
/* 002 */   return new SpecificUnsafeProjection(references);
/* 003 */ }
/* 004 */
/* 005 */ class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
/* 006 */
/* 007 */   private Object[] references;
/* 008 */   private UnsafeRow result;
/* 009 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
/* 010 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
/* 011 */
/* 012 */
/* 013 */   public SpecificUnsafeProjection(Object[] references) {
/* 014 */     this.references = references;
/* 015 */     result = new UnsafeRow(2);
/* 016 */     this.holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
/* 017 */     this.rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 2);
/* 018 */   }
/* 019 */
/* 020 */   // Scala.Function1 need this
/* 021 */   public java.lang.Object apply(java.lang.Object row) {
/* 022 */     return apply((InternalRow) row);
/* 023 */   }
/* 024 */
/* 025 */   public UnsafeRow apply(InternalRow i) {
/* 026 */     holder.reset();
/* 027 */
/* 028 */     rowWriter.zeroOutNullBytes();
/* 029 */
/* 030 */
/* 031 */     $line48.$read$$iw$$iw$$iw$$iw$$iw$$iw$Foo value2 = ($line48.$read$$iw$$iw$$iw$$iw$$iw$$iw$Foo)i.get(0, null);
/* 032 */
/* 033 */     boolean isNull1 = false;
/* 034 */     final java.lang.String value1 = isNull1 ? null : (java.lang.String) value2.a();
/* 035 */     isNull1 = value1 == null;
/* 036 */     boolean isNull = isNull1;
/* 037 */     final UTF8String value = isNull ? null : org.apache.spark.unsafe.types.UTF8String.fromString(value1);
/* 038 */     isNull = value == null;
/* 039 */     if (isNull) {
/* 040 */       rowWriter.setNullAt(0);
/* 041 */     } else {
/* 042 */       rowWriter.write(0, value);
/* 043 */     }
/* 044 */
/* 045 */
/* 046 */     $line48.$read$$iw$$iw$$iw$$iw$$iw$$iw$Foo value4 = ($line48.$read$$iw$$iw$$iw$$iw$$iw$$iw$Foo)i.get(0, null);
/* 047 */
/* 048 */     boolean isNull3 = false;
/* 049 */     final int value3 = isNull3 ? -1 : value4.if();
/* 050 */     if (isNull3) {
/* 051 */       rowWriter.setNullAt(1);
/* 052 */     } else {
/* 053 */       rowWriter.write(1, value3);
/* 054 */     }
/* 055 */     result.setTotalSize(holder.totalSize());
/* 056 */     return result;
/* 057 */   }
/* 058 */ }

imarios avatar Jun 01 '17 07:06 imarios

How does Spark escapes that?

OlivierBlanvillain avatar Jun 01 '17 16:06 OlivierBlanvillain

Good question. It seems that Spark handles this at runtime:

java.lang.UnsupportedOperationException: `if` is a reserved keyword and cannot be used as field name
- root class: "Foo"
  at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:585)
  at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:355)
  at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583)
  at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425)
  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
  at org.apache.spark.sql.Encoders$.product(Encoders.scala:274)
  at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47)
  ... 42 elided

imarios avatar Jun 03 '17 17:06 imarios

@OlivierBlanvillain, maybe we can have a type class that uses shapeless to check if the case class contains and reserved keywords and fail at compile time?

imarios avatar Jun 03 '17 18:06 imarios

Totally doable, but I would put that very low in the priority list. I'm not even sure the added safety would be the extra compilation time...

OlivierBlanvillain avatar Jun 07 '17 06:06 OlivierBlanvillain

Was writting an issue when I saw this one. I also encountered this with a non 'back-ticked' word : char

case class SomeCaseClass(char: String)

charlescd avatar May 17 '19 16:05 charlescd