kafka-tutorials [DO NOT MERGE] Parallel Consumer tutorial

[DO NOT MERGE] Parallel Consumer tutorial

Open davetroiano opened this issue 1 year ago • 4 comments

Description

Staging Docs

New tutorial checklist

Aug 19 '22 20:08 davetroiano

The build is failing because this line is missing from settings.gradle

include 'includes:tutorials:kafka-parallel-consumer-application:kafka:code'

Aug 23 '22 21:08 bbejeck

Great tutorial, @davetroiano! I've made a pass over it - overall looks good.

Aug 23 '22 22:08 bbejeck

Great tutorial, @davetroiano! I've made a pass over it - overall looks good.

thanks for taking a pass @bbejeck! i'll incorporate your feedback and make a few more usability improvements before considering this ready to merge

where do you think this should live on the tutorials home page? I didn't put too much thought into what I went with here. I could see Learn the basics > Build applications > Confluent Parallel Consumer, or a new Confluent Parallel Consumer section under Master advanced concepts that would initially contain just this Parallel Consumer hello world. If we add other PC tutorials (for vert.x / reactor, more realistic applications / use cases), they'd go in this section. Maybe to start we put it under Learn the basics, and consider separating it out into a dedicated tile only if / when we hit 2 tutorials. Let me know what you think

Aug 24 '22 15:08 davetroiano

General comments

I actually like building up a local directory from scratch rather than git cloning at the beginning. Maybe though there should be a link to the completed solution in github for later reference.
I'm really loving the line number comments. Seriously helpful to read exactly what each line is doing.
I generally think you should have learners paste the java files before analyzing their anatomy. It's nice to have the local file up alongside the tutorial.
You bury the lead a bit! You should mention the 40s -> 3s 10x latency improvement in the introduction! Also you should talk about how parallel consumer in just 4 lines gives you multithreading for free rather than having to code it yourself.

Specific comments

Write the cluster information into a local file

[x] " From the Clients view, get the connection information customized to your cluster" -- I think the full path is Data Integration -> Clients -> New client -> Java.
[x] "Create new credentials for your Kafka cluster and Schema Registry" -- I might suggest writing a description like "parallel-consumer-tutorial" so the keys are easy to find and delete later

Create a topic

[x] I think the topic name should be less generic. Maybe prefixed by "parallel-consumer" to avoid name collisions. "parallel-consumer.input" or something

Configure the project

[ ] can we at least use java 11? Java 8 is getting long in the tooth. It doesn't really matter functionally in this case, but it might be a signal to devs that we're not cool. Maybe. I don't know.
[x] This is the first time I'm seeing the word Gradle. Should probably list dependencies up at the very start of the tutorial. I would suggest a prerequisites section, including links to help folks install gradle and the confluent cli and anything else required. It turns out to be a pain in the butt to install the latest version of gradle. I tried to install with apt and it gave me v4.4(suuuper old!!!!). I tried to install with snap and it turns out WSL doesn't mess with snap. I don't have brew, but that would probably be easiest. This turned out to be a lot of friction, so we should make it easier for folks. Maybe include a little script like

wget -c https://services.gradle.org/distributions/gradle-7.5.1-bin.zip -P /tmp
sudo unzip /tmp/gradle-7.5.1-bin.zip -d /opt/gradle
sudo chmod +x /opt/gradle/gradle-7.5.1/bin/gradle 
sudo ln -s /opt/gradle/gradle-7.5.1/bin/gradle /usr/bin/gradle

Ideally this would all be preconfigured in a 1-click lab environment, but that's out of scope of this review.

[x] "And be sure to run the following command to obtain the Gradle wrapper" -- Maybe just a quick heads up what the wrapper does and why it's necessary.

Create the Confluent Parallel Consumer Application

[x] I'm getting a little lost knowing whether I'm supposed to be copying these snippets and what file to copy them into. Oh I see, after I read about the different parts, I copy the application into a file. I would suggest going the other way. Start with a touch src/main/java/io/confluent/developer/ParallelConsumerApplication.java and paste in the code. THEN break down each the pieces.
[ ] Does the shutdown method need a countdown latch for graceful shutdown like kafka streams (example)?

Produce the sample data

[ ] To get the sense of data in motion, I would suggest touching the text file and running tail -f output-topic.txt in a terminal window side by side with the producer so you see the records showing up in real time
[ ] Maybe there's a way to emphasize the parallel nature more. I might suggest writing to multiple files at the same time, one file per key. This kind of captures what's happening with the processing -- the parallel consumer is automatically load balancing this work across threads for higher parallelism. That is reflected in the parallel text files. Let me know what you think about that.

Update properties file with Confluent Cloud information

[x] The cat command is a little borked because the properties file we're appending to don't end in a blank line. Both properties files end up looking like this:

# Application-specific properties
input.topic.name=perftest-input-topic
records.to.consume=10000
record.handler.sleep.ms=20# Required connection configs for Kafka producer, consumer, and admin
bootstrap.servers=<redacted>

Compile and run the KafkaConsumer-based performance test

[ ] I think a good point to make here is that the time, 40s, is roughly the total processing time divided amongst the 6 consumer instances (20 ms/record * 10000 records / 1000 ms/s / 6 consumers ~= 33 seconds/consumer). That shows the parallelism maxed by number of partitions.

Compile and run the Confluent Parallel Consumer performance test

[x] Oops! This command is invoking the kafka consumer, not the parallel consumer:

java -cp build/libs/confluent-parallel-consumer-application-standalone-0.0.1-all.jar io.confluent.developer.MultithreadedKafkaConsumerPerfTest configuration/perftest-kafka-consumer.properties

It should be instead:

java -cp build/libs/confluent-parallel-consumer-application-standalone-0.0.1-all.jar io.confluent.developer.ParallelConsumerPerfTest configuration/perftest-parallel-consumer.properties

[ ] It's so fast (!!!) it's actually hard to see the results. The logs after the progress bar bury the result. Maybe it would be best to output the results of each test to a file?

Sep 02 '22 19:09 chuck-confluent

While the performance test runs, take a few sips of the beverage… actually never mind. It’s done:

I loved this line in the tutorial!

Oct 13 '22 18:10 bbejeck

I validated the tutorial by running the steps for Confluent Cloud

Oct 13 '22 18:10 bbejeck

kafka-tutorials kafka-tutorials copied to clipboard

[DO NOT MERGE] Parallel Consumer tutorial

Description

Staging Docs

New tutorial checklist

General comments

Specific comments

Write the cluster information into a local file

Create a topic

Configure the project

Create the Confluent Parallel Consumer Application

Produce the sample data

Update properties file with Confluent Cloud information

Compile and run the KafkaConsumer-based performance test

Compile and run the Confluent Parallel Consumer performance test

kafka-tutorials
kafka-tutorials copied to clipboard