flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-35177] Fix DataGen Connector documentation

Open morozov opened this issue 1 year ago • 1 comments

What is the purpose of the change

The code examples used in the documentation are incorrect. Currently, they look like this:

GeneratorFunction<Long, Long> generatorFunction = index -> index;
double recordsPerSecond = 100;

DataGeneratorSource<String> source =
        new DataGeneratorSource<>(
             generatorFunction,
             Long.MAX_VALUE,
             RateLimiterStrategy.perSecond(recordsPerSecond),
             Types.STRING);

The generator function returns Long but the DataGeneratorSource uses String and Types.STRING, so the types do not match, and the example code cannot be compiled.

Brief change log

The types used by the DataGeneratorSource are updated to match the return type of the generator function.

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

morozov avatar Apr 20 '24 01:04 morozov

CI report:

  • 082bf2bbd1171fedab03541383f651d5958433ca Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Apr 20 '24 01:04 flinkbot

@GOODBOY008 done. FWIW, you can reference code blocks right on GitHub. This way, then are better readable and could be navigated to:

https://github.com/apache/flink/blob/4faf0966766e3734792f80ed66e512aa3033cacd/flink-connectors/flink-connector-datagen/src/main/java/org/apache/flink/connector/datagen/source/DataGeneratorSource.java#L79-L86

morozov avatar Aug 25 '24 15:08 morozov

if we are talking about examples there is already existing one(and ready to run and play) in examples module which is quite close to the one from description https://github.com/apache/flink/blob/4faf0966766e3734792f80ed66e512aa3033cacd/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/datagen/DataGenerator.java#L36-L43

how about having the same approach both in docs and in this example?

snuyanzin avatar Aug 25 '24 20:08 snuyanzin

how about having the same approach both in docs and in this example?

@snuyanzin what exactly do you propose?

morozov avatar Aug 26 '24 01:08 morozov

the idea is to have same example both in doc and example module where compilation is checked during ci process

snuyanzin avatar Aug 26 '24 06:08 snuyanzin

I copied the rate limiting example from DataGenerator.java to everywhere into the documentation. I also updated DataGenerator.java and renamed generatorSource to source for consistency with the rest of the documentation examples. For instance: https://github.com/apache/flink/blob/5217d50e58e6210e60c3e8a46ac948ccc7f0c901/docs/content/docs/connectors/datastream/datagen.md#L48-L49

morozov avatar Aug 26 '24 17:08 morozov

@morozov LGTM

GOODBOY008 avatar Aug 28 '24 02:08 GOODBOY008

@morozov Can you open another pr to branch release-1.20 ?

GOODBOY008 avatar Aug 28 '24 02:08 GOODBOY008

@GOODBOY008, please see https://github.com/apache/flink/pull/25259.

morozov avatar Aug 28 '24 03:08 morozov