opentelemetry-demo icon indicating copy to clipboard operation
opentelemetry-demo copied to clipboard

Implement message queue

Open secustor opened this issue 3 years ago • 1 comments

Fixes #429

Changes

  • [x] add Kafka to docker-compose
  • [x] add Kafka producer to checkoutservice
  • [x] implement Accountingservice
  • [x] implement Frauddetectionservice

For significant contributions please make sure you have completed the following items:

  • [x] Appropriate CHANGELOG.md updated for non-trivial changes
  • [x] Design discussion issue #429

secustor avatar Oct 17 '22 11:10 secustor

Hey just an fyi we're on a feature freeze for the next week or 2 due to our upcoming v1 release and kubecon.

Feel free to keep this open and working on it tho!

cartersocha avatar Oct 19 '22 00:10 cartersocha

I wonder that why add two new services? Since the checkout service is the only kafka message producer, the two new services are both consumers. What's the point of this?

fatsheep9146 avatar Oct 31 '22 13:10 fatsheep9146

Async workflows usually have multiple consumers with different use cases.
The intention is to show async with OTEL in a common scenario and that both spans are added to the parent span of the checkoutservice

secustor avatar Oct 31 '22 15:10 secustor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Nov 10 '22 03:11 github-actions[bot]

As mentioned in the CNCF slack channel. We are going to pause adding any new services/components until after we clean up build times and fix multi-arch build. There is plenty of interest to add a message queue system to the demo, but this is also accompanied by several concerns about resource constraints and user experience for those learning or experimenting with the demo in a dev environment.

Given the concerns, can we look into using Kafka Raft (Kafka without zookeeper) to help minimize the footprint of this addition?

puckpuck avatar Nov 11 '22 03:11 puckpuck

Given the concerns, can we look into using Kafka Raft (Kafka without zookeeper) to help minimize the footprint of this addition?

I have tried initially to use Raft, but ran into problems and therefore I have fallen back to the tried and tested methods.
If resource usage is a concern here, then I will give it another try.

secustor avatar Nov 11 '22 17:11 secustor

@puckpuck I have changed the setup to use Kafka RAFT instead of Kafka

secustor avatar Nov 11 '22 22:11 secustor

@secustor - Wanted to try out this Kafka support so I checked out your fork's branch and ran docker compose up -d on my Mac M1 Pro, i am getting below error likely related to proto file import

[+] Running 0/2
 ⠿ frauddetection Warning                                                                                                                                                                                                        1.2s
 ⠿ accountingservice Warning                                                                                                                                                                                                     1.2s
[+] Building 1.9s (31/36)
 => [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice internal] load build definition from Dockerfile                                                                                                                        0.0s
 => => transferring dockerfile: 1.28kB                                                                                                                                                                                           0.0s
 => [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice internal] load build definition from Dockerfile                                                                                                                    0.0s
 => => transferring dockerfile: 1.31kB                                                                                                                                                                                           0.0s
 => [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice internal] load .dockerignore                                                                                                                                           0.0s
 => => transferring context: 35B                                                                                                                                                                                                 0.0s
 => [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice internal] load .dockerignore                                                                                                                                       0.0s
 => => transferring context: 35B                                                                                                                                                                                                 0.0s
 => [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice internal] load metadata for docker.io/library/alpine:latest                                                                                                            1.0s
 => [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice internal] load metadata for docker.io/library/golang:1.19.2-alpine                                                                                                     1.0s
 => [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice internal] load metadata for docker.io/library/openjdk:18-slim                                                                                                      1.7s
 => [auth] library/openjdk:pull token for registry-1.docker.io                                                                                                                                                                   0.0s
 => [auth] library/alpine:pull token for registry-1.docker.io                                                                                                                                                                    0.0s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                                                                                                    0.0s
 => ERROR [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice] importing cache manifest from ghcr.io/open-telemetry/demo:v1.0.0-accountingservice                                                                              0.3s
 => [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  1/10] FROM docker.io/library/golang:1.19.2-alpine@sha256:e4dcdac3ed37d8c2b3b8bcef2909573b2ad9c2ab53ba53c608909e8b89ccee36                                     0.0s
 => [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice stage-1 1/3] FROM docker.io/library/alpine@sha256:b95359c2505145f16c6aa384f9cc74eeff78eb36d308ca4fd902eeeb0a0b161b                                                     0.0s
 => [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice internal] load build context                                                                                                                                           0.0s
 => => transferring context: 476B                                                                                                                                                                                                0.0s
 => ERROR [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice] importing cache manifest from ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice                                                                      0.3s
 => [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice internal] load build context                                                                                                                                       0.0s
 => => transferring context: 78.87kB                                                                                                                                                                                             0.0s
 => CANCELED [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice] https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.16.0/opentelemetry-javaagent.jar                               0.3s
 => [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice stage-1 1/4] FROM docker.io/library/openjdk:18-slim@sha256:7d2fb53a15fc3574a4e60444526fbc083572a21fec31b68a0c4e5ec0a6689920                                        0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice stage-1 2/3] WORKDIR /usr/src/app/                                                                                                                              0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice stage-1 2/4] WORKDIR /usr/src/app/                                                                                                                          0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  2/10] RUN apk add build-base protoc                                                                                                                    0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  3/10] WORKDIR /usr/src/app/                                                                                                                            0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  4/10] COPY ./src/accountingservice/ ./                                                                                                                 0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  5/10] COPY ./pb/ ./proto/                                                                                                                              0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  6/10] RUN go mod download                                                                                                                              0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  7/10] RUN go install google.golang.org/protobuf/cmd/[email protected]                                                                                0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  8/10] RUN go install google.golang.org/grpc/cmd/[email protected]                                                                                0.0s
 => ERROR [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  9/10] RUN protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./                                                                            0.3s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice builder 3/6] COPY ./src/frauddetectionservice/ ./                                                                                                           0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice builder 4/6] COPY ./pb/ ./src/main/proto/                                                                                                                   0.0s
 => CACHED [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice builder 5/6] RUN chmod +x ./gradlew                                                                                                                         0.0s
 => CANCELED [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice builder 6/6] RUN ./gradlew shadowJar                                                                                                                      0.3s
------
 > [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice] importing cache manifest from ghcr.io/open-telemetry/demo:v1.0.0-accountingservice:
------
------
 > [ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice] importing cache manifest from ghcr.io/open-telemetry/demo:v1.0.0-frauddetectionservice:
------
------
 > [ghcr.io/open-telemetry/demo:v1.0.0-accountingservice builder  9/10] RUN protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./:
#0 0.278 google/protobuf/timestamp.proto: File not found.
#0 0.278 demo.proto:17:1: Import "google/protobuf/timestamp.proto" was not found or had errors.
#0 0.278 demo.proto:280:3: "google.protobuf.Timestamp" is not defined.
#0 0.279 demo.proto:281:3: "google.protobuf.Timestamp" is not defined.
------
failed to solve: executor failed running [/bin/sh -c protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./]: exit code: 1

saurabhdes avatar Nov 13 '22 06:11 saurabhdes

@saurabhdes My latest commit should fix your problem.

secustor avatar Nov 13 '22 14:11 secustor

@saurabhdes My latest commit should fix your problem.

thanks @secustor , yes confirmed all the containers are now running successfully. Needed clarity on couple of items:

  1. In Jaeger UI - i don't see accountingservice or frauddetection service. So i am wondering if these two services are getting called or if they are then is the instrumentation for them setup correctly. Screenshot 2022-11-13 at 10 29 08 AM

  2. Can you please elaborate why accountingservice or frauddetection were created vs. updating communication between existing services to use Kafka (so we need to add just 1 more broker container vs. 3 now). I am assuming it's to minimize complexity in existing services but wanted to get your thoughts.

saurabhdes avatar Nov 13 '22 18:11 saurabhdes

  1. In Jaeger UI - i don't see accountingservice or frauddetection service. So i am wondering if these two services are getting called or if they are then is the instrumentation for them setup correctly.

These are only called when checking out, I have not digged into the load generators yet.
If manually going through the process they will show up. image

  1. Can you please elaborate why accountingservice or frauddetection were created vs. updating communication between existing services to use Kafka (so we need to add just 1 more broker container vs. 3 now). I am assuming it's to minimize complexity in existing services but wanted to get your thoughts.

My thoughts have been:

  • minimising complexity as these examples are also viewed as implementation examples and not only as testing platform. At least I have used it also for this purpose.
  • implementation of a Kotlin example
  • There have been no Java/Go microservices which receive have no communication with the frontend. I'm only aware of instrumentations of Kafka clients for Java and Go.

secustor avatar Nov 14 '22 13:11 secustor

  1. In Jaeger UI - i don't see accountingservice or frauddetection service. So i am wondering if these two services are getting called or if they are then is the instrumentation for them setup correctly.

These are only called when checking out, I have not digged into the load generators yet. If manually going through the process they will show up. image

  1. Can you please elaborate why accountingservice or frauddetection were created vs. updating communication between existing services to use Kafka (so we need to add just 1 more broker container vs. 3 now). I am assuming it's to minimize complexity in existing services but wanted to get your thoughts.

My thoughts have been:

  • minimising complexity as these examples are also viewed as implementation examples and not only as testing platform. At least I have used it also for this purpose.
  • implementation of a Kotlin example
  • There have been no Java/Go microservices which receive have no communication with the frontend. I'm only aware of instrumentations of Kafka clients for Java and Go.

Thanks @secustor for the detailed reply ! Sounds good.

  1. I found out i was using older image of checkout service. I deleted it and recreated the image from your branch , after that without needing to do any manual checkout, i started to see accountingservice and frauddetectionservice in Jarger flowmap as expected. I think this is because checkout service is called by loadgen https://github.com/open-telemetry/opentelemetry-demo/blob/main/src/loadgenerator/locustfile.py#L250 Screenshot 2022-11-14 at 6 21 29 PM

  2. As FYI - In addition to above 2 languages i found Kafka instrumentation for one more language Nodejs https://www.npmjs.com/package/opentelemetry-instrumentation-kafkajs , https://www.aspecto.io/blog/opentelemetry-kafkajs-instrumentation-for-node-js/

saurabhdes avatar Nov 15 '22 03:11 saurabhdes

From the SIG call --

We're removing the performance freeze tag from this and would like to get it in before EOY.

austinlparker avatar Nov 21 '22 16:11 austinlparker

we should test the build time increase from 3 new containers. One thing to keep in mind is that go doesn’t have an operator instrumentation option. We’d need to implement that with the Java service or replace Go

cartersocha avatar Nov 24 '22 00:11 cartersocha

For the architecture docs and services table my bias would be to label the language Java not Kotlin for the fraud detection service

cartersocha avatar Nov 24 '22 00:11 cartersocha

I have tested builds again this branch and main.

Setup:

  • Mac M1
  • ContainerRuntime:
    • DockerDaemon: Colima
    • 8 GB RAM
    • 4 Cores
    • 100 GB Disk

Methodology Run docker system prune --force && docker-compose build --no-cache

implemented changes:

  • 12m 11s
  • 13m 22s

main ( 5a2c0efa923f18bc1f5ae12bbd1843619b351ece ):

  • 12m 30s
  • 12m 58s

So there are no relevant changes. The limiting factor is the build time of the currency service.

For the architecture docs and services table my bias would be to label the language Java not Kotlin for the fraud detection service

I prefer to use Kotlin as this signals that multiple JVM languages are supported. An alternative would be to rename JAVA to JVM

secustor avatar Nov 26 '22 11:11 secustor

From the Google docs:

Merge PR then re-write new service?

@julianocosta89 @cartersocha @puckpuck I have not been able to join the SIG Meeting, is there a decision to rewrite accountingservice in Python before or after the merge of this PR?

secustor avatar Dec 05 '22 17:12 secustor

The goal is to get this PR merged this week with Go for the accountingservice

After the merge, we will rewrite the service in another language that can be auto-instrumented with the OpenTelemetry Operator. We may want to document that as part of this PR so expectations are set with the broader community.

puckpuck avatar Dec 05 '22 17:12 puckpuck

🎉

puckpuck avatar Dec 07 '22 04:12 puckpuck