envoy-perf
envoy-perf copied to clipboard
[Salvo] Nighthawk is built every time we test an Envoy image
With scavenging mode and binary mode, we build Nighthawk every time we test an Envoy image, which wasted much time for rebuilding if we have many images to be tested. https://github.com/envoyproxy/envoy-perf/blob/main/salvo/src/lib/benchmark/scavenging_benchmark.py#L75
/assign @gyohuangxin
gyohuangxin is not allowed to assign users.
@mum4k I'm looking at this issue, can you assign it to me? Thanks!
Thank you @gyohuangxin, assigned.
Hi @gyohuangxin , I think salvo clones and builds binary under different dirs every time make it unable to reuse the code and cache. I have a idea that we can generate dir name base on hash of salvo job configuration file to make sure same job manages to clone and build at same dir. What's your opinion on this?
@xu1zhou Yes, you're correct. Nighthawk binary files will be built multiple times in different dirs. I think it's a good idea but it may be a huge change (just my personal guess, haven't verified it yet). Here is another idea which may change less files, we can name the nighthawk build dir based on the salvo job, then check if the nighthawk build dir exists before building Nighthawk.
I haven't much cycle on this issue now, please feel free to take this issue and raise PR. These two methods are both feasible, you can chose either.
@gyohuangxin thanks for your suggestion!
Some points I found after going through the code:
- nighthawk calls
file_ops.get_random_dir(home_dir)(https://github.com/envoyproxy/envoy-perf/blob/2d45a91915b8076da227371be4d6d1ee706a8670/salvo/src/lib/source_tree.py#L62-L65) to create ramdom build directory within /tmp/salvo. it's easy to create a fixed directory like/tmp/salvo/nighthawkto contain the source. But in current logic,_build_dirneed to be a tempdir object withnameattribute to be invoked in__repr__(self)(https://github.com/envoyproxy/envoy-perf/blob/2d45a91915b8076da227371be4d6d1ee706a8670/salvo/src/lib/source_tree.py#L69-L76) Problem is, function likeos.mkdir()unenable to return an object like that. - Another workaround is to use an existing path contains nighthawk source code to replace an url. This would spare all clone operation from network, but for each bulid job, it works as before.
Thanks for looking into this @xu1zhou. It is true that the current code expects a tempdir object in multiple locations, but that doesn't have to stay that way.
We could define our own object that would selectively wrap tempdir or a selected directory under the same API and pass this object around instead of tempdir. If you choose to do this, please send the definition of our own object and the following refactor from tempdir in separate PRs, so their scope remains fairly small. This will allow us to discuss before we over-invest in an implementation.
@xu1zhou Thanks for report your findings. I can understand your confusion, envoy-perf/salvo/src/lib/source_tree.py is used for both Nighthawk and Envoy building process, the object changes will also have affects on Envoy.
Looked at TemporaryDirectory function, can we use suffix or prefix arguments to identify whether it is the duplicate nighthawk building? If it doesn't work, as @mum4k mentioned, you can create own objects.
@gyohuangxin tempobject with suffix/prefix arguments only joins specified suffix/prefix with random string as name implies. These objects are different with whole path like /tmp/salvo/prefix-8avnbwxt-suffix
@xu1zhou Thanks for report your findings. I can understand your confusion, envoy-perf/salvo/src/lib/source_tree.py is used for both Nighthawk and Envoy building process, the object changes will also have affects on Envoy.
Looked at TemporaryDirectory function, can we use suffix or prefix arguments to identify whether it is the duplicate nighthawk building? If it doesn't work, as @mum4k mentioned, you can create own objects.
This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 14 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 104 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.