kafka-connect-hdfs icon indicating copy to clipboard operation
kafka-connect-hdfs copied to clipboard

Question about temp files

Open dinegri opened this issue 4 years ago • 1 comments
trafficstars

When HDFS Sink connector start buffering records, it writes a temp file at +/tmp//_tmp.json hdfs path.

Is HDFS Sink connector writing this temp file every time it recevies a message (I mean append to the file)? Or it is keeping the temp file into JVM memory and only when it reaches flush size, it will write the temp file with all records accumulated by flush size?

Thank you in advance

dinegri avatar May 13 '21 13:05 dinegri

Hi @dinegri records are appended to temp files as they arrive, not kept in memory.

The temp files are then moved to the final path if one or more of these are true:

  • flush.size amount of records have been reached in the temp file
  • rotate.interval.ms was reached
  • rotate.schedule.interval.ms was reached
  • record schema was changed

You can find more information on these in the Confluent documentation.

dosvath avatar May 17 '21 17:05 dosvath