stocator icon indicating copy to clipboard operation
stocator copied to clipboard

Cannot use append mode when writing spark dataframe on Watson Studio

Open charles2588 opened this issue 7 years ago • 1 comments

Write the file once

df_data_1.write.format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
              .option("codec", "org.apache.hadoop.io.compress.GzipCodec")\
              .mode("append")\
              .save(cos.url('TESTAPPEND/CARS', 'catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a'))

First append mode write is successful. and then #Lets write again in append mode and it fails

df_data_1.write.format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
              .option("codec", "org.apache.hadoop.io.compress.GzipCodec")\
              .mode("append")\
              .save(cos.url('TESTAPPEND/CARS', 'catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a'))

Py4JJavaError: An error occurred while calling o161.save. : org.apache.hadoop.fs.FileAlreadyExistsException: mkdir on existing directory cos://catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a.os_a9bbfb9f99684afe9ec11076b75f1831_configs/TESTAPPEND/CARS at com.ibm.stocator.fs.ObjectStoreFileSystem.mkdirs(ObjectStoreFileSystem.java:453) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:118) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp

Full notebook:- https://dataplatform.ibm.com/analytics/notebooks/v2/ec6f5fd0-6141-493c-b2cc-979a9b312393/view?access_token=3b6130f2206249bd03795f932ca3ad30f110321a3086dac07ee4d3eb4d4cbe56

Looking at the append method in the connector code, i see append is not supported. https://github.com/CODAIT/stocator/blob/0866ef099c838efbfe46e7ad6a036ecfbed2012d/src/main/java/com/ibm/stocator/fs/ObjectStoreFileSystem.java

 public FSDataOutputStream append(Path f, int bufferSize,
      Progressable progress) throws IOException {
    throw new IOException("Append is not supported in the object storage");
  }

If append is not supported, is there a workaround or may be the connector should throw that append is not supported rather than above error.

charles2588 avatar Jun 20 '18 21:06 charles2588

@charles2588 thanks for reporting this. In general append + object storage is usually a bad idea, no matter which connector you use. I will review the issue you observed to better understand the root cause and to propose the best solution to resolve it.

gilv avatar Jun 21 '18 05:06 gilv