iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Flink: Refactoring StreamingReaderOperator to read data nonblocking

Open hililiwei opened this issue 3 years ago • 2 comments

Co-authored-by: xuwei [email protected]

We currently have hundreds of billions of data in some of our tables, the old one needed to get all the data at once during one checkpoint, which caused job to get stuck for long time. Finally, a series of error such as checkpoint timeout occur.

image

The new one solves this problem.

image

hililiwei avatar Aug 10 '22 18:08 hililiwei

@stevenzwu and @openinx, can you review this?

rdblue avatar Aug 10 '22 23:08 rdblue

@hililiwei I have a couple of high-level questions

  • can you elaborate what is exactly blocked in readers?
  • can you try the new FLIP-27 source?

stevenzwu avatar Aug 11 '22 16:08 stevenzwu

  • can you elaborate what is exactly blocked in readers?

I'll add some screenshots to illustrate.

  • can you try the new FLIP-27 source?

We don't have Flink 1.14 in our pro-environment, but we plan to upgrade it at the end of the month and I'll try the FLIP-27 as soon as I can.

Could you please review it? Thank you. @stevenzwu

hililiwei avatar Aug 18 '22 12:08 hililiwei

@hililiwei can you explain the problem and solution with more details? It is not easy to understand what exactly is the issue here based on the screenshots?

We currently have hundreds of billions of data in some of our tables, the old one needed to get all the data at once during one checkpoint, which caused job to get stuck for long time.

I am also not quite getting the problem from the above description.

stevenzwu avatar Aug 20 '22 21:08 stevenzwu