OpenMLDB
OpenMLDB copied to clipboard
Import data cause BufferOverflowException
Bug Description I am trying to import data to online db with hive table, exception happens if there are some strings length bigger than 255:
Caused by: java.io.IOException: write row to openmldb failed on: ...
at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:89)
at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:39)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:419)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:457)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:358)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:194)
at java.nio.ByteBuffer.put(ByteBuffer.java:867)
at com._4paradigm.openmldb.common.codec.FlexibleRowBuilder.build(FlexibleRowBuilder.java:385)
at com._4paradigm.openmldb.sdk.impl.InsertPreparedStatementImpl.buildRow(InsertPreparedStatementImpl.java:302)
at com._4paradigm.openmldb.sdk.impl.InsertPreparedStatementImpl.execute(InsertPreparedStatementImpl.java:317)
at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:77)
... 13 more
Expected Behavior import data success
Relation Case no
Steps to Reproduce
- prepare some data that some all string value length bigger than 255 and some less
- import these data to online db
- exception with java.nio.BufferOverflowException
After digging into the code,
// FlexibleRowBuilder.java
int totalSize = strFieldStartOffset + strAddrLen + strTotalLen;
// check totalSize if bigger than UNIT8_MAX or UNIT16_MAX ...
int curStrAddrSize = CodecUtil.getAddrLength(totalSize);
if (curStrAddrSize > strAddrSize) {
// strAddrBuf will be expanded if the totalSize bigger than UNIT8_MAX(255)
strAddrBuf = expandStrLenBuf(curStrAddrSize, settedStrCnt);
strAddrSize = curStrAddrSize;
totalSize = strFieldStartOffset + strAddrLen + strTotalLen;
}
private variable strAddrBuf will be expanded if totalSize bigger than UNIT8_MAX(255) and wil be used for the following records but never reduce the array size, that causes java.nio.BufferOverflowException.
currently i manually reduce the strAddrBuf size at the end of result allocate to solve the problem.