nydus icon indicating copy to clipboard operation
nydus copied to clipboard

nydus performance tracing

Open imeoer opened this issue 2 years ago • 12 comments

We need a mechanism to collect the time and resource consumption of nydus at each stage of build time and run time to give us insight into the performance impact of development iterations, identify performance bottlenecks, etc. The main requirements are as follows.

  • The ability to count the time consumed by nydus-image/nydusd at each stage in span format.
  • Statistics on memory usage, CPU, and network requests.
  • The ability to present performance data in report form and compare it with the previous result.

It should be noted that when adding a tracepoint, it is as least invasive to the code as possible and does not lose too much performance, or there is a switch to control the trace.

Referenceable tools or libraries:

  • https://github.com/tikv/minitrace-rust
  • https://opentelemetry.io/
  • https://www.jaegertracing.io/
  • https://developer.aliyun.com/article/799040

imeoer avatar Oct 26 '22 02:10 imeoer

Is there anyone working on this? Maybe I can help with it. @imeoer

yawqi avatar Nov 07 '22 02:11 yawqi

Is there anyone working on this? Maybe I can help with it. @imeoer

Hi Qi, I'm working on it. But we can collaborate.

wraymo avatar Nov 07 '22 03:11 wraymo

@yawqi @wraymo Thanks! We can split it into the two sub-tasks, "trace collection" and "cpu/memory/network requests statistics", Which one you prefer?

imeoer avatar Nov 07 '22 03:11 imeoer

I'd prefer "trace collection" @imeoer

wraymo avatar Nov 07 '22 03:11 wraymo

OK, I will look into the "cpu/memory/network requests statistics".

yawqi avatar Nov 07 '22 03:11 yawqi

@wraymo @yawqi Thanks! Any specific design doc can be committed here first and discussed together.

imeoer avatar Nov 07 '22 04:11 imeoer

Hello 👋 I haven't seen any of the tracing bits land yet, is there a plan to use something like opentel for this. Do you need anything, I am happy to help out on this as we could really use better insight into these bits, especially:

The ability to count the time consumed by nydus-image/nydusd at each stage in span format.

lilic avatar Aug 10 '23 09:08 lilic

Hello 👋 I haven't seen any of the tracing bits land yet, is there a plan to use something like opentel for this. Do you need anything, I am happy to help out on this as we could really use better insight into these bits, especially:

The ability to count the time consumed by nydus-image/nydusd at each stage in span format.

@lilic Are you talking about network performance trace? You can try using the nydusctl metrics backend --sock /nydusd.sock command to inspect the number of HTTP requests called by nydusd, as well as metric info such as read time distribution.

imeoer avatar Aug 10 '23 10:08 imeoer

@imeoer thank you! That is good to know! But we need a way to send tracing data so we can see performance over time, so open-tel tracing for example or more metrics would be useful for us.

lilic avatar Aug 10 '23 11:08 lilic

@lilic nydusd exported some metrics by an API, this is also the data source of nydusctl: https://github.com/dragonflyoss/image-service/blob/f3cdd071b01ea5d2086e376a7b2bfee3ee233360/api/openapi/nydus-api-v1.yaml#L241

But indeed we'd better make it get in open tracing, unfortunately no one is doing the job yet. :(

imeoer avatar Aug 10 '23 11:08 imeoer

@lilic Here is an example to get the backend metric:

https://github.com/dragonflyoss/image-service/blob/f3cdd071b01ea5d2086e376a7b2bfee3ee233360/smoke/tests/tool/nydusd.go#L384C18-L384C18

imeoer avatar Aug 11 '23 01:08 imeoer

@imeoer I see, I thought it was Prometheus metrics so I was confused why I couldn't find them when looking at /metrics endpoint. Thanks for that. Sadly that doesn't work for me, still looking into it. But I get the following error when trying that:

~ # nydusctl --sock=/run/containerd-nydus/system.sock --raw metrics backend
Error: deserialize: trailing characters at line 1 column 5

We can move this to slack so I don't spam this issue.

I did want to look into adding tracing to at least our fork and if it works for us, then contributing it to here.

lilic avatar Aug 14 '23 09:08 lilic