nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

arch/risc-v/litex: provide architecture specific perfmon bindings.

Open g2gps opened this issue 1 year ago • 5 comments

Summary

provide architecture specific perfmon bindings.

Using the up_perf_xx bindings directory with the core clock is more efficient than performing a nanosecond conversion on every gettime event.

Impact

A nanosecond conversion is no longer required each time a perf_gettime is used.

Testing

Performance monitoring subsystem is used on custom hardware for testing. The arty_a7:knsh-tickless configuration builds, however the subsystem isn't used.

g2gps avatar Jun 06 '24 04:06 g2gps

ping @g2gps

acassis avatar Jun 22 '24 22:06 acassis

ping @g2gps

I was hoping for some feedback from @xiaoxiang781216 in regards to my comment above, before continuing with the PR.

g2gps avatar Jun 24 '24 00:06 g2gps

ping @g2gps

I was hoping for some feedback from @xiaoxiang781216 in regards to my comment above, before continuing with the PR.

Yes, it's better to move to the common place.

xiaoxiang781216 avatar Jun 24 '24 03:06 xiaoxiang781216

ping @g2gps

I was hoping for some feedback from @xiaoxiang781216 in regards to my comment above, before continuing with the PR.

Yes, it's better to move to the common place.

Sorry, I was referring to my other comment, where you mentioned using CSR_CYCLE

g2gps avatar Jun 24 '24 03:06 g2gps

ping @g2gps

I was hoping for some feedback from @xiaoxiang781216 in regards to my comment above, before continuing with the PR.

Yes, it's better to move to the common place.

Sorry, I was referring to my other comment, where you mentioned using CSR_CYCLE

Done.

xiaoxiang781216 avatar Aug 06 '24 18:08 xiaoxiang781216

A couple of questions have come up in the details whilst revisiting this:

  • The vexriscv_smp core doesn't implement the performance counter by default. Opensbi handles this case on startup by attempting a read and trapping the exception.
  • The currently implementation could be moved to riscv_perf.c instead, however some risc-v ports, e.g esp32c3_perf already have there own architecture specific implementation, so there would need to be some conditional logic around this.
  • Assuming the Litex architecture specific perf_xx implementation is kept, I'll need to update this to handle the standard vexriscv core, which only has the m-mode timer.

g2gps avatar Sep 11 '24 01:09 g2gps

A couple of questions have come up in the details whilst revisiting this:

  • The vexriscv_smp core doesn't implement the performance counter by default. Opensbi handles this case on startup by attempting a read and trapping the exception.
  • The currently implementation could be moved to riscv_perf.c instead, however some risc-v ports, e.g esp32c3_perf already have there own architecture specific implementation, so there would need to be some conditional logic around this.
  • Assuming the Litex architecture specific perf_xx implementation is kept, I'll need to update this to handle the standard vexriscv core, which only has the m-mode timer.

Let chip developer add riscv_perf.c to their Make.defs if they want to use the common implementation.

xiaoxiang781216 avatar Sep 11 '24 01:09 xiaoxiang781216

Let chip developer add riscv_perf.c to their Make.defs if they want to use the common implementation.

Yes, that would work for the general implementation.

However, this won't work for the vexriscv core. I'm not sure what the better approach is here. We could either:

  • Have a kconfig option which specifies if the performance monitor CSRs are implemented. Both the time based and counter based implementations are in the common location.
  • Just have the counter based implementation in the generic location. Architectures can provide their own implementation if needed.

The second option seems a bit cleaner, but I'd appreciate any suggestions.

g2gps avatar Sep 11 '24 02:09 g2gps

Let chip developer add riscv_perf.c to their Make.defs if they want to use the common implementation.

Yes, that would work for the general implementation.

However, this won't work for the vexriscv core. I'm not sure what the better approach is here. We could either:

  • Have a kconfig option which specifies if the performance monitor CSRs are implemented. Both the time based and counter based implementations are in the common location.
  • Just have the counter based implementation in the generic location. Architectures can provide their own implementation if needed.

The second option seems a bit cleaner, but I'd appreciate any suggestions.

How about we provide riscv_perf_cycle.c and riscv_perf_time.c? Let's chip developer add one of these as needed.

xiaoxiang781216 avatar Sep 11 '24 03:09 xiaoxiang781216

How about we provide riscv_perf_cycle.c and riscv_perf_time.c? Let's chip developer add one of these as needed.

This makes sense. I've updated as suggested. I've testing the CSR_TIME implementation on Litex. I believe I can test the cycle based implementation on QEMU, if I do a quick integration.

g2gps avatar Sep 11 '24 22:09 g2gps

I did a quick test with rv-virt:nsh, by adding riscv_perf_cycle.c and enabling CONFIG_SCHED_IRQMONITOR, CONFIG_ARCH_PERF_EVENTS

nsh> cat /proc/irqs
IRQ HANDLER  ARGUMENT    COUNT    RATE    TIME
 11 80005f16 00000000          2    4.385 10927
 23 80005856 8002ca28        405  888.157 41596
 37 80002d14 80028f88          2    4.385 58285

The calculated time is not correct, as I've used the MTIMER_FREQ. I don't know how to find the cycle frequency in qemu-rv

g2gps avatar Sep 12 '24 01:09 g2gps

need fix this ci error:

  Normalize arty_a7/knsh-tickless
67d66
< CONFIG_PERF_OVERFLOW_CORRECTION=y

xiaoxiang781216 avatar Sep 12 '24 01:09 xiaoxiang781216

need fix this ci error:

  Normalize arty_a7/knsh-tickless
67d66
< CONFIG_PERF_OVERFLOW_CORRECTION=y

Thanks, SYSTEM_TIME64 should have been enabled here as well. Since this is the width of the underlying system timer.

g2gps avatar Sep 12 '24 02:09 g2gps