flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-31192][connectors/dataGen] Fix dataGen takes too long to initi…

Open xuzhiwen1255 opened this issue 2 years ago • 26 comments

What is the purpose of the change

In the sequence mode, the dataGen field will preload all the sequence values when it is initialized, and it will take a long time during this period

Brief change log

  • Adapt the way of preloading the value to generate a sequence value every time the next method is called
  • The checkpoint saves not all the sequence values, but the position where the current task sends the sequence value

Verifying this change

DataGeneratorSourceTest and DataGenTableSourceFactoryTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no

xuzhiwen1255 avatar Feb 24 '23 07:02 xuzhiwen1255

@reswqa Can you review it for me, Thank you.

xuzhiwen1255 avatar Feb 24 '23 07:02 xuzhiwen1255

CI report:

  • ec135c137777197b23ffa20dc488679adbe50020 UNKNOWN
  • c911719877f16d656a473c68d0d03748e45dfc1d Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Feb 24 '23 07:02 flinkbot

Thanks @xuzhiwen1255 for creating this, It seems that the CI is failure, PTAL first.

reswqa avatar Feb 27 '23 15:02 reswqa

@flinkbot run azure

xuzhiwen1255 avatar Mar 10 '23 06:03 xuzhiwen1255

@reswqa Re-running the cli is no problem.

xuzhiwen1255 avatar Mar 10 '23 08:03 xuzhiwen1255

@reswqa CI is fine now, can you come over and have a look?

xuzhiwen1255 avatar Mar 27 '23 03:03 xuzhiwen1255

@RyanSkraba @XComp Please check again, thanks.

xuzhiwen1255 avatar May 18 '23 03:05 xuzhiwen1255

@flinkbot run azure

xuzhiwen1255 avatar May 21 '23 08:05 xuzhiwen1255

On another note: We should rebase the branch to include FLINK-31962 which would make CI succeed again.

XComp avatar May 24 '23 16:05 XComp

In addition to the comments, I feel like we're lacking some tests here. Is there a way to unit test the generator (maybe even in the context of a parallelism change)?

I think about it, the change of parallelism already exists in the existing unit test, I think we don't need to add the test, because it already includes the complete generated test case of the sequence.

In this DataGeneratorSourceTest#innerTestDataGenCheckpointRestore test method, the degree of parallelism is modified .

xuzhiwen1255 avatar May 26 '23 02:05 xuzhiwen1255

I think about it, the change of parallelism already exists in the existing unit test, I think we don't need to add the test, because it already includes the complete generated test case of the sequence.

Thanks for pointing that out. I missed DataGeneratorSourceTest. Do you still see the value in having a SequenceGeneratorTest testing the basic functionality of the SequenceGenerator, though? It might help to understand the contract of the SequenceGenerator class.

XComp avatar May 26 '23 13:05 XComp

I think about it, the change of parallelism already exists in the existing unit test, I think we don't need to add the test, because it already includes the complete generated test case of the sequence.

Thanks for pointing that out. I missed DataGeneratorSourceTest. Do you still see the value in having a SequenceGeneratorTest testing the basic functionality of the SequenceGenerator, though? It might help to understand the contract of the SequenceGenerator class.

I don't think it is necessary to use SequenceGeneratorTest for testing, because the existing tests have covered its basic functions. Later, we may need to test how to avoid data skew and test sequence generation integrity when adjusting operator parallelism.

@XComp WDYT?

xuzhiwen1255 avatar May 29 '23 02:05 xuzhiwen1255

@flinkbot run azure

xuzhiwen1255 avatar May 29 '23 02:05 xuzhiwen1255

@XComp @RyanSkraba Any more questions?

xzw0223 avatar Jun 27 '23 05:06 xzw0223

Hi @xuzhiwen1255 , sorry for making you wait for so long. I was on vacation and have to focus on some other work right now. I try to get back to this PR soon'ish

XComp avatar Jun 27 '23 06:06 XComp

Hi @xuzhiwen1255 , sorry for making you wait for so long. I was on vacation and have to focus on some other work right now. I try to get back to this PR soon'ish

@XComp It's all right, have a nice holiday.

xuzhiwen1255 avatar Jun 27 '23 06:06 xuzhiwen1255

I still think that it might make sense to introduce a SequenceGeneratorTest. This would allow us to test the edge cases easily (e.g. reaching Long.MAX_VALUE) more specifically.

In #21971 we added SequenceGeneratorTest, I think we can add the corresponding test method in #21971

@XComp WDYT?

xuzhiwen1255 avatar Jul 13 '23 02:07 xuzhiwen1255

In https://github.com/apache/flink/pull/21971 we added SequenceGeneratorTest, I think we can add the corresponding test method in https://github.com/apache/flink/pull/21971

I'm not a big fan of adding tests that belong in a PR into a separate PR. You could just create the test class in this PR with the set of tests that fit to this PR. PR #21971 should then add additional tests related to its changes. The conflict resolution shouldn't be too complex because your adding independent test methods. :thinking:

XComp avatar Jul 13 '23 14:07 XComp

I'm not a big fan of adding tests that belong in a PR into a separate PR. You could just create the test class in this PR with the set of tests that fit to this PR. PR #21971 should then add additional tests related to its changes. The conflict resolution shouldn't be too complex because your adding independent test methods.

OK let me add a test.

xuzhiwen1255 avatar Jul 17 '23 05:07 xuzhiwen1255

Any updates on that one @xuzhiwen1255 ? :slightly_smiling_face:

XComp avatar Aug 31 '23 16:08 XComp

Any updates on that one @xuzhiwen1255 ? 🙂

@XComp Sorry, the company has been busy iterating recently. I will update it in the next few days.

xuzhiwen1255 avatar Aug 31 '23 16:08 xuzhiwen1255

No worries. Especially considering that I won't be that responsive in September in general. I just wanted to let you know to make prioritization of work easier for you.

XComp avatar Sep 01 '23 06:09 XComp

@XComp I'm sorry that I haven't had time to deal with this problem recently due to some things. Now I plan to continue to deal with this issue. Do you have time to take a look?

xuzhiwen1255 avatar Dec 26 '23 02:12 xuzhiwen1255

I think so. I will modify it as soon as possible.

xuzhiwen1255 avatar Feb 22 '24 11:02 xuzhiwen1255

@flinkbot run azure

xuzhiwen1255 avatar Feb 23 '24 11:02 xuzhiwen1255

Hi @XComp I have solved the problem, please check again, thank you.

xuzhiwen1255 avatar Feb 25 '24 11:02 xuzhiwen1255