Cannot use append mode when writing spark dataframe on Watson Studio
Write the file once
df_data_1.write.format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
.option("codec", "org.apache.hadoop.io.compress.GzipCodec")\
.mode("append")\
.save(cos.url('TESTAPPEND/CARS', 'catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a'))
First append mode write is successful. and then #Lets write again in append mode and it fails
df_data_1.write.format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
.option("codec", "org.apache.hadoop.io.compress.GzipCodec")\
.mode("append")\
.save(cos.url('TESTAPPEND/CARS', 'catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a'))
Py4JJavaError: An error occurred while calling o161.save. : org.apache.hadoop.fs.FileAlreadyExistsException: mkdir on existing directory cos://catalogdsxreproduce4a77ab6a4f2f47b3b6bedc7174a64c4a.os_a9bbfb9f99684afe9ec11076b75f1831_configs/TESTAPPEND/CARS at com.ibm.stocator.fs.ObjectStoreFileSystem.mkdirs(ObjectStoreFileSystem.java:453) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:118) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp
Full notebook:- https://dataplatform.ibm.com/analytics/notebooks/v2/ec6f5fd0-6141-493c-b2cc-979a9b312393/view?access_token=3b6130f2206249bd03795f932ca3ad30f110321a3086dac07ee4d3eb4d4cbe56
Looking at the append method in the connector code, i see append is not supported. https://github.com/CODAIT/stocator/blob/0866ef099c838efbfe46e7ad6a036ecfbed2012d/src/main/java/com/ibm/stocator/fs/ObjectStoreFileSystem.java
public FSDataOutputStream append(Path f, int bufferSize,
Progressable progress) throws IOException {
throw new IOException("Append is not supported in the object storage");
}
If append is not supported, is there a workaround or may be the connector should throw that append is not supported rather than above error.
@charles2588 thanks for reporting this. In general append + object storage is usually a bad idea, no matter which connector you use. I will review the issue you observed to better understand the root cause and to propose the best solution to resolve it.