OpenMLDB icon indicating copy to clipboard operation
OpenMLDB copied to clipboard

Import data cause BufferOverflowException

Open ljwh opened this issue 1 year ago • 0 comments

Bug Description I am trying to import data to online db with hive table, exception happens if there are some strings length bigger than 255:

Caused by: java.io.IOException: write row to openmldb failed on:  ... 
	at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:89)
	at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:39)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:419)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:457)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:358)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.nio.BufferOverflowException
	at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:194)
	at java.nio.ByteBuffer.put(ByteBuffer.java:867)
	at com._4paradigm.openmldb.common.codec.FlexibleRowBuilder.build(FlexibleRowBuilder.java:385)
	at com._4paradigm.openmldb.sdk.impl.InsertPreparedStatementImpl.buildRow(InsertPreparedStatementImpl.java:302)
	at com._4paradigm.openmldb.sdk.impl.InsertPreparedStatementImpl.execute(InsertPreparedStatementImpl.java:317)
	at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:77)
	... 13 more

Expected Behavior import data success

Relation Case no

Steps to Reproduce

  1. prepare some data that some all string value length bigger than 255 and some less
  2. import these data to online db
  3. exception with java.nio.BufferOverflowException

After digging into the code,

    // FlexibleRowBuilder.java
    int totalSize = strFieldStartOffset + strAddrLen + strTotalLen;
    // check totalSize if bigger than UNIT8_MAX or UNIT16_MAX ...
    int curStrAddrSize = CodecUtil.getAddrLength(totalSize);
    if (curStrAddrSize > strAddrSize) {
        // strAddrBuf will be expanded if the totalSize bigger than UNIT8_MAX(255)
        strAddrBuf = expandStrLenBuf(curStrAddrSize, settedStrCnt);
        strAddrSize = curStrAddrSize;
        totalSize = strFieldStartOffset + strAddrLen + strTotalLen;
    }

private variable strAddrBuf will be expanded if totalSize bigger than UNIT8_MAX(255) and wil be used for the following records but never reduce the array size, that causes java.nio.BufferOverflowException.

currently i manually reduce the strAddrBuf size at the end of result allocate to solve the problem.

ljwh avatar Jan 26 '24 03:01 ljwh