AgentBaker icon indicating copy to clipboard operation
AgentBaker copied to clipboard

perf: enable shallow replication

Open zachary-bailey opened this issue 1 year ago • 7 comments

What type of PR is this?

/kind perf

What this PR does / why we need it:

This PR enables the use of shallow replication in all VHD build pipelines in order to reduce feedback time / increase productivity. Enabling shallow replication decreases pipeline duration by 10-13 minutes depending on the run. Averages derived from 3 runs from master and 3 from this branch.

This is accomplished by installing the azure-arm plugin for packer, which enables the use of shallow replication.

Which issue(s) this PR fixes:

Unsatisfactory VHD Build Times.

Requirements:

zachary-bailey avatar Jul 03 '24 19:07 zachary-bailey

Pull Request Test Coverage Report for Build 9783868032

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 70.486%

Totals Coverage Status
Change from base Build 9783805229: 0.0%
Covered Lines: 2627
Relevant Lines: 3727

💛 - Coveralls

coveralls avatar Jul 03 '24 19:07 coveralls

Pull Request Test Coverage Report for Build 9783881983

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 70.486%

Totals Coverage Status
Change from base Build 9783805229: 0.0%
Covered Lines: 2627
Relevant Lines: 3727

💛 - Coveralls

coveralls avatar Jul 03 '24 19:07 coveralls

Pull Request Test Coverage Report for Build 9783951092

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 70.486%

Totals Coverage Status
Change from base Build 9783805229: 0.0%
Covered Lines: 2627
Relevant Lines: 3727

💛 - Coveralls

coveralls avatar Jul 03 '24 19:07 coveralls

Pull Request Test Coverage Report for Build 10566228187

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 71.409%

Totals Coverage Status
Change from base Build 10564587216: 0.0%
Covered Lines: 2620
Relevant Lines: 3669

💛 - Coveralls

coveralls avatar Jul 09 '24 16:07 coveralls

sorry I might've missed something, though didn't we find out that shallow replication doesn't support replicating images to more than one target region? that would be problematic if we ever try to run VHD builds and abe2e's in different regions, though we probably should try to avoid that anyways

cameronmeissner avatar Aug 14 '24 18:08 cameronmeissner

It depends on how we're making image versions. If shallow replication is used to publish an image version to the gallery from a pipeline, that image version can not be replicated to other regions, and additional replicas can not be made in the source region.

But if shallow replication is used in the pipeline, and we take the resulting image version and convert it to a VHD blob, then create an image version out of that VHD blob, it works fine. The resulting image version's replication property is marked as "Full" and it can be successfully replicated. That's why I changed it back to test env only, because I was not sure if that would be okay in prod, but it didn't seem like it would cause issues in test.

@cameronmeissner

zachary-bailey avatar Aug 14 '24 18:08 zachary-bailey

It depends on how we're making image versions. If shallow replication is used to publish an image version to the gallery from a pipeline, that image version can not be replicated to other regions, and additional replicas can not be made in the source region.

But if shallow replication is used in the pipeline, and we take the resulting image version and convert it to a VHD blob, then create an image version out of that VHD blob, it works fine. The resulting image version's replication property is marked as "Full" and it can be successfully replicated. That's why I changed it back to test env only, because I was not sure if that would be okay in prod, but it didn't seem like it would cause issues in test.

@cameronmeissner

I see, yeah since we don't currently run agentbaker e2e's in prod this wouldn't have any impact, though we do run them in test

since we do run the e2e's in test, the issue would arise if we built the VHDs (e.g. ran the build pipeline) in one region, then tried to run the e2e's against said VHD build in another region, at least with the current implementation

though as I said above I think in general we should try to avoid doing that anyway, but in certain cases it might be problematic if we have to use a different region for GPU quota, for example

cameronmeissner avatar Aug 15 '24 17:08 cameronmeissner