odp
odp copied to clipboard
[PATCH v2] test: performance: ipsec: test app for multi-core ipsec performance
Adding new application odp_ipsec_ordered to test ipsec outbound processing with multiple worker cores. The current IPsec performance test application is not suitable to test ordering operations overhead hence a new app is written with the following capabilities. Main core creates and supplies packets with a certain level of flow control to worker cores via the configured scheduler. Scheduler mode and workers count is configurable. Performance data is collected only for the IPsec calls and summary is displayed per thread per algo for each packet size
Based on the feedback this app can be evolved more to add any additional capabilities
Signed-off-by: Vijay Ram Inavolu [email protected]
This seems to be heavily based on the existing odp_ipsec performance test application, which is (c) Linaro.
To avoid duplication of code, maybe there should be just one test application that can do different kind of tests or some way of code sharing between the different applications.
This application performs two system calls (getrusage) for every processed packet. This is not good and can skew the results badly. If you really want to measure just the the IPsec call and not the whole loop, using just gettimeofday() (or clock_gettime() for better resolution) or other mechanism implemented using vDSO without actual kernel entry should work much better. But maybe the code should just use ODP APIs (odp_time_local() etc) for it.
Since ODP worker threads are supposed to be run on dedicated cores, I am not sure there is even point in trying to read the kernel maintained run time statistics. They may not even work that well if the worker threads are well isolated from the kernel (using full tickless mode etc).
There is also some false cache line sharing going on between the worker threads through gbl_args->result as the elements for different threads are not in different cache lines.
I am not sure measuring the duration of just odp_ipsec_out() instead of the whole loop tells the whole story.Some of the ordering overhead (like releasing the ordered context as part of the next schedule call) happens outside it.
This seems to be heavily based on the existing odp_ipsec performance test application, which is (c) Linaro.
To avoid duplication of code, maybe there should be just one test application that can do different kind of tests or some way of code sharing between the different applications.
This application performs two system calls (getrusage) for every processed packet. This is not good and can skew the results badly. If you really want to measure just the the IPsec call and not the whole loop, using just gettimeofday() (or clock_gettime() for better resolution) or other mechanism implemented using vDSO without actual kernel entry should work much better. But maybe the code should just use ODP APIs (odp_time_local() etc) for it.
Since ODP worker threads are supposed to be run on dedicated cores, I am not sure there is even point in trying to read the kernel maintained run time statistics. They may not even work that well if the worker threads are well isolated from the kernel (using full tickless mode etc).
There is also some false cache line sharing going on between the worker threads through gbl_args->result as the elements for different threads are not in different cache lines.
I am not sure measuring the duration of just odp_ipsec_out() instead of the whole loop tells the whole story.Some of the ordering overhead (like releasing the ordered context as part of the next schedule call) happens outside it.
This app is derived from the original single core app and it seems better to keep it separate for few reasons
The multi core IPsec app only supports sync mode so combining this single core app which does both sync and async might become tricky to get meaningful measurements in common format
The original app uses getrusage based measurements and changing that to something else might make it difficult for any past and future comparisons to use the app
Changing the measurement from per IPsec call to entire run like the original app may not scale well in this multicore app as the number of workers are increased
The current examples which are present already have duplicated functions between them which are slightly modified for each test application, this new app also does the same