hadoop-connectors icon indicating copy to clipboard operation
hadoop-connectors copied to clipboard

Temp files created by GoogleHadoopSyncableOutputStream are not deleted after output stream is closed

Open selimelawwa opened this issue 3 years ago • 3 comments

I am using the hadoop GCS connector to read/write files using hadoop filesystem, and there seems to be an issue related to GoogleHadoopSyncableOutputStream, as temp files are not deleted. Is there any config needed for temp files to be deleted or is this a bug?

Can you please check this stackoverflow question for more details.

Please check and let me know if there are needed configurations or if there is code changes to to be added? Or might be a newer jar required?

selimelawwa avatar Sep 06 '21 13:09 selimelawwa

@medb @cyxxy please check this out.

selimelawwa avatar Sep 06 '21 13:09 selimelawwa

@medb @mprashanthsagar Is there any findings on this? Do we expect any fix in a new release? Is there any way to fix this using some configurations using the current connector?

selimelawwa avatar Sep 15 '21 12:09 selimelawwa

Hi @selimelawwa

I was unable to re-produce the issue

@Test
  public void testTempFile() throws Exception {
    URI path = gcsFsIHelper.getUniqueObjectUri("hflush_syncsEverything");
    Path hadoopPath = new Path(path);
    System.out.println(path);

    Configuration config = getTestConfig();
    // config.setEnum(GCS_OUTPUT_STREAM_TYPE.getKey(), OutputStreamType.FLUSHABLE_COMPOSITE);
    // config.setLong(GCS_OUTPUT_STREAM_SYNC_MIN_INTERVAL_MS.getKey(),
    // Duration.ofDays(1).toMillis());
    FileSystem ghfs = GoogleHadoopFileSystemIntegrationHelper.createGhfs(path, config);

    byte[] testData = new byte[10];
    new Random().nextBytes(testData);

    try (FSDataOutputStream out = ghfs.create(hadoopPath)) {
      for (int i = 0; i < 4096; i++) {
        out.write(
            String.format("some arbitrary string %d\n", new Object[] {Long.valueOf(i)}).getBytes());
      }
    } catch (Throwable t) {
      System.out.println("couldn't create file" + t);
    }
    System.out.println("File written successfully");
  }

The file is created file, and i do not see any residual TEMP files. In the above test, i have explicitly created a FSDataOutputStream and trimmed all additional configs.

Could you share the code for the File which is in your example class ? That is the only difference i notice here.

mprashanthsagar avatar Sep 27 '21 11:09 mprashanthsagar