opentelemetry.io icon indicating copy to clipboard operation
opentelemetry.io copied to clipboard

New Blog Post: Build your own OpenTelemetry backents for fun, analytics... and maybe some profit :)

Open arusevm opened this issue 1 year ago • 12 comments

Hi,

We've been using OpenTelemetry in some less obvious ways and have accumulated code and content that we started sharing recently. We believe that with the help of additional open source projects it's quite easy for software developers to build their own OTEL backends - for analytics, audit logging and even usage-based billing. Thought that might be interesting to the broader community, but not hundred percent sure what fits and what doesn't, so, I'll quickly outline it here and see how it goes.

A short intro I have in mind:

So, you've instrumented your code, launched your collector pipelines and OTEL signals are now flowing to your ops dashboards.

As a software engineer - you may be thinking 'What else can I do with OTEL data?' and you're not alone. Here is how you can build your own telemetry backends for analytics, security audit and even billing.

And a very, very quick outline (as I'm not sure these topics actually fit):

With a little bit of code (and a lot of other open source projects) you can:

  • Illustrate the inner workings of your apps to help other team members learn
  • Build better software - collect telemetry from your tests and analyze it (yes, as part of your release pipelines)
  • Implement reliable (and yes, structured) audit logs
  • Run an usage-based billing platform for your SaaS

Depending on the topics above - a post might refer to open source from Apache (like Arrow, BookKeeper, Druid, Parquet, Superset and other...), Pandas, Jupyter notebooks and more. May also refer to popular and not-so-open-source platforms like GitHub (and its Actions) for example.

Also, might refer to open source created by us - (note: it's a work in progress, haven't shared everything yet).


Happy to provide more details here or on CNCF Slack.

arusevm avatar Dec 17 '24 15:12 arusevm

@open-telemetry/docs-approvers adding this as context, @arusevm and I talked about this as a follow up to https://github.com/open-telemetry/opentelemetry.io/pull/5419 on slack and I suggested that a blog post may be an option here.

@arusevm on the blog post, a few points:

  • If we go ahead with this blog post, I think what is important to state in the intro is that "building your own observability backend" is a fun exercise, and sometimes it can be useful to learn more about your data, but that in general you should leave it to subject matter experts to build a solution for production and that if you like that topic there are existing OSS projects that might be interested in your contributions.
  • let's pick 1 or 2 open source components and then go with them. I personally think that Superset could be super interesting to see from a visualization perspective. If that works well we can talk about another post with some other technology.

svrnm avatar Dec 18 '24 09:12 svrnm

If we go ahead with this blog post, I think what is important to state in the intro is that "building your own observability backend" is a fun exercise, and sometimes it can be useful to learn more about your data

Sounds good to me - fun is always good, aaand also, my feeling is many people prefer to learn about something (like OTEL) by looking at the data. On top of that - by showing some data we can also add a few suggestions on how to add 'own' telemetry. That is - if a developer is looking to instrument their own code - a few practical suggestions on how to approach it... sort of follow from seeing how fields in the data are populated... We do have a few of these suggestions on our repo - I can get a link to that if an example will help?

let's pick 1 or 2 open source components and then go with them. I personally think that Superset could be super interesting to see from a visualization perspective.

Yeah, I think something more focused is better. I shared a number of bullet points that I knew are too much for a single post.

Superset comes with Druid (needs a database to query), we also have a fork of the otel demo app that also deploys Druid and Superset, so, we have the extra option that people can quickly try it. Don't know if that fits, but it's there anyway...

My only worry is that the OpenTelemetry guides on writing blog posts do mention something like 'prefer CNCF open source over non-CNCF... like Prometheus/Jaeger...' and Superset/Druid might go against that?

arusevm avatar Dec 18 '24 10:12 arusevm

Sounds good to me - fun is always good, aaand also, my feeling is many people prefer to learn about something (like OTEL) by looking at the data. On top of that - by showing some data we can also add a few suggestions on how to add 'own' telemetry. That is - if a developer is looking to instrument their own code - a few practical suggestions on how to approach it... sort of follow from seeing how fields in the data are populated... We do have a few of these suggestions on our repo - I can get a link to that if an example will help?

That's the spin I am looking for for such a post, it's less about "oh, observability backends are so expensive, let my build my own thing, how hard could that be", but more about exploring the data, playing with it, etc. So yes, suggestions are welcome.

let's pick 1 or 2 open source components and then go with them. I personally think that Superset could be super interesting to see from a visualization perspective.

Yeah, I think something more focused is better. I shared a number of bullet points that I knew are too much for a single post.

Superset comes with Druid (needs a database to query), we also have a fork of the otel demo app that also deploys Druid and Superset, so, we have the extra option that people can quickly try it. Don't know if that fits, but it's there anyway...

That would work for me. You might want to call out that superset also has support for clickhouse, but druid is also good.

My only worry is that the OpenTelemetry guides on writing blog posts do mention something like 'prefer CNCF open source over non-CNCF... like Prometheus/Jaeger...' and Superset/Druid might go against that?

We prefer CNCF, but in that particular case it's about demonstrating otel + these projects, and most importantly you are not affiliated with them/trying to sell some Apache Superset like product. E.g. it would be a different matter if someone from Preset would write that article, as this would go against our social media guidelines (of course I would love seeing something like that on their blog or any other blog as well, it's just not applicable here), I hope that makes sense.

svrnm avatar Dec 20 '24 09:12 svrnm

That's the spin I am looking for for such a post, it's less about "oh, observability backends are so expensive, let my build my own thing, how hard could that be", but more about exploring the data, playing with it, etc. So yes, suggestions are welcome.

Yeah, have no intention to go in the direction of 'critical use' (as done by DevOps monitoring production environments, for example) - we don't have the experience and can hardly contribute anything. Over here we're mostly into 'actually telemetry data has other uses too (especially for developers).' So, whatever I do will be 'out of critical use' and into (if possible) - 'get creative about what else you can do with this data'. For example - with Superset might do examples with different charts (different than the ones you typically get on Ops monitoring platforms) to illustrate key points in OpenTelemetry - like signal correlation, but also in a non-critical context... I don't know, more like illustrating how things work rather than what happens right now on prod...

By the way - the above is one of the reasons why I do like the 'tests telemetry' part - it's not 'critical use', code tests are common to all software developers, and the needs (in terms of visualizations on the data) are different (than monitoring production environments). What do you think about this? I kinda would like to have some sort of context or background for the post, like - 'let's have some fun with telemetry from the tests that we run anyway'... I have some other candidates too, but not necessarily as broad, might be specific to a programming language, etc... What do you think?

One more question by the way, kind of follows from all of the above: I'm not completely sure about how OpenTelemetry looks at the 'Dev or Ops' thing. On one hand, on the home page there's a 'Get started by your role', sort of suggests to me that there is a distinction between the two. But on the other - for the blog - can I 'afford' to write only for the devs? Or anything else I might be missing here?

arusevm avatar Dec 20 '24 13:12 arusevm

'let's have some fun with telemetry from the tests that we run anyway'... I have some other candidates too, but not necessarily as broad, might be specific to a programming language, etc... What do you think?

works for me!

On one hand, on the home page there's a 'Get started by your role', sort of suggests to me that there is a distinction between the two. But on the other - for the blog - can I 'afford' to write only for the devs?

That eventually goes away. Don't worry too much about it.

Note that most folks (including myself) will be out of office during the next week (or two), so let's follow up afterwards!

Thanks & happy holidays

svrnm avatar Dec 20 '24 14:12 svrnm

Hi! It's been a while, but here I go again :) Hope you're doing alright and had nice holidays!

A bit of an update:

Currently, I'm working on a post that goes a bit like this:

OTEL can help you beyond production monitoring. It produces enough data that, if you, as a software developer, do some analytics on it - it can help you understand how your code will scale, where load hits, etc. Generally revolving around getting insights on aspects (of code behavior) that usually remain hidden.

Then (next section), as we'll be doing analytics - let's get some data. For a non-production use (non-critical), it's easy, as there are a number of open source tools that you can reuse - there's druid, superset (and others). You might have to write a bit of code glue them together, but OTEL is based on pretty popular tech (protobuf, grpc...), you're probably already familiar with the basics anyway. (what we published is really simple).

Next - as software developers - we're more interested in getting some preliminary data - before we release. So, let's instrument our tests and collect telemetry from our build pipelines. Say, on a pull request. (With some examples). An extra - we can also label our telemetry with details like the PR number/git commit id that will help us later.

Next section - data is already in - let's do some charts in superset. As developers (looking at what is about to be released) we're interested in the hidden aspects of our code (kinda like a reminder for what was said in the first section), so we're not going to do the typical time-based charts (you know the kind - in production monitoring usually one of the dimensions is time). We'll get into analyzing how our span durations change as a metric changes - do my spans slow down as the number of concurrent visitors increases? and so on.

An extra - as we also labelled the telemetry with details from git - we can also compare our new (potential) release with previous releases.

Finally - a few more suggestions (otherwise the post gets way too long) - labelling the maturity of spans so that we can look for further improvements, some suggestions for even deeper analytics, etc (kinda like - ideas for next steps).


How does that sound? If it is okay - should I dump a very raw draft of it somewhere?

arusevm avatar Feb 05 '25 15:02 arusevm

overall sounds good to me, a draft would be helpful to provide further feedback

svrnm avatar Feb 06 '25 08:02 svrnm

@svrnm Okay, cool, will put something on github somewhere. Couple of things though:

  • will take some time - currently trying to get some nice illustrations (charts) and examples (like code or diagrams)
  • it is still way too long in my opinion, pretty sure it's okay to drop a lot of it (as I'm guessing readers on the otel blog are already familiar with it), but would be nice if I get some feedback before I drop anything...

arusevm avatar Feb 06 '25 13:02 arusevm

@arusevm never hesitate to raise a draft PR early, or to share a draft document with us.

svrnm avatar Feb 10 '25 07:02 svrnm

Hi @arusevm, just checking in 😄

Is this still active? I've added the stale label for now, let me know if it's still planned so I can remove it and we can move forward.

Please don't hesitate to reach out if you need any assistance as well.

vitorvasc avatar Nov 03 '25 11:11 vitorvasc

Hi @arusevm, just checking in 😄

Is this still active? I've added the stale label for now, let me know if it's still planned so I can remove it and we can move forward.

Please don't hesitate to reach out if you need any assistance as well.

Hi @vitorvasc,

Yeah, it's still active on my side ... not at a terribly high priority, but still around :)

arusevm avatar Nov 04 '25 13:11 arusevm

thanks @arusevm, I moved this back to triace:deciding as this has stalled for a long while and we also reworked our blog submission guidelines, if you get to it please review them accordingly: https://opentelemetry.io/docs/contributing/blog/#before-submitting-a-blog-post

svrnm avatar Nov 28 '25 06:11 svrnm