skywalking
skywalking copied to clipboard
[Feature] Support traces and logs exporting.
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
Description
This is a feature that was asked repeatedly, but we hold it for a long time, with concerns about the performance of upstream(exporting target). Now, the whole OAP backend is stable from an architectural perspective. I would like to begin this feature discussion and talk about possible solutions.
Use case
Exporting SkyWalking's traces and logs to the 3rd party
We had the metrics exporter, https://skywalking.apache.org/docs/main/next/en/setup/backend/metrics-exporter/, for years. Users could use this easy to build forward if they need most of the metrics, but on-demand query through GraphQL doesn't fit their requirements.
Same for traces and logs. users may have some platform, which wants to analyze logs and traces for business performance analysis, or AIOps, like @Superskyyy 's new project (https://github.com/SkyAPM/aiops-engine-for-skywalking)
Challenges
The major challenges about these exporting, are building
- A scalable channel to export data.
- A good limiter and breaker to make sure low perf upstream would not break or slow down the OAP server.
- Measure the exporting performance and speed in the self-observability
Milestone
I am not sure whether we should set milestones in 9.3.0. I would discuss it with @wankai123 to see.
Related issues
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
One specific challenge learnt from the AIOps engine (or similar analytics projects) is multiple consumers are needed to ingest data in a streaming fashion, but streaming rpcs are quite difficult to scale. I propose we choose Kafka etc. in this direction (avoid direct coupling with the consuming project) to export large data like traces and logs, so it is less likely to congest/ easier to scale.
Kafka should be fine, but generally and in theory, it has no difference from cluster-supported gRPC streaming. Of course, we may consider Kafka as the default exporter channel as it is widely used.
@Team this looks like an interesting feature. I do have some knowledge on the kafka side, would love to contribute in any possible way
This is done in https://github.com/apache/skywalking/pull/9817