setup-python icon indicating copy to clipboard operation
setup-python copied to clipboard

pip cache always uses node process architecture, ignoring architecture key

Open alex opened this issue 1 year ago • 6 comments

Description:

The pip caching code uses a cache key that does not include the Python binary's architecture. This leads to cache thrashing.

https://github.com/actions/setup-python/blob/main/src/cache-distributions/pip-cache.ts#L68-L75

These keys should include the Python binary architecture, which is specified with the architecture input to the action.

Action version: 5.2.0

Platform:

  • [x] Ubuntu
  • [x] macOS
  • [x] Windows

Runner type:

  • [x] Hosted
  • [x] Self-hosted

Tools version: All

alex avatar Oct 26 '24 14:10 alex

Hello @alex 👋, Thank you for this feature request. We will investigate it and get back to you as soon as we have some feedback.

priya-kinthali avatar Oct 28 '24 05:10 priya-kinthali

Hello @alex , This change, introduced in PR #896, adds the architecture to the pip cache key. This approach enhances caching efficiency by preventing cache thrashing when switching between different Python architectures (e.g., x86_64 and arm64). As a result, builds become faster, and multi-architecture workflows are better supported.

Please validate this from your end and share any further inputs to confirm the exact requirement.

aparnajyothi-y avatar Jan 09 '25 06:01 aparnajyothi-y

No, that PR does not address this issue. If you look, you will also see that that PR was merged before I filed this.

That PR makes use of the node processes' architecture. But what I requested is the Python binary architecture.

On Windows, these can differ. On Windows, both x86 (32-bit) and x86-64 (64-bit) Python binaries can be installed, and the node process architecture will always be 64-bit.

We need to include the Python binary's architecture, which is specified with the architecture key to the setup-python action.

alex avatar Jan 09 '25 12:01 alex

Hello @alex, Thank you for the clarification. We will review the proposed change and get back to you with feedback soon. We appreciate your input and patience.

aparnajyothi-y avatar Jan 22 '25 06:01 aparnajyothi-y

Hello, Thank you for submitting this feature request and providing such a detailed explanation. After careful consideration, we’ve decided not to proceed with this change as it could potentially disrupt existing workflows, especially those relying on shared caches across different architectures. While we understand the benefits of more accurate caching, maintaining the stability of current workflows is our priority at this time.

We appreciate your suggestion and will continue to assess any future opportunities for improvement. Thank you for your understanding, and we look forward to your continued input!

aparnajyothi-y avatar Feb 10 '25 08:02 aparnajyothi-y

I'm not sure I follow how this could disrupt existing workflows. Can you explain more?

alex avatar Feb 10 '25 13:02 alex

Hello @alex, thank you for your follow-up and for raising this important point.

Our primary concern with introducing the Python binary architecture into the cache key is the potential disruption it could cause to existing workflows—particularly those that rely on shared or pre-warmed caches across jobs or matrix configurations where Python architecture may vary. Currently, the cache key is based on the Node process architecture, allowing caches to be reused across 32-bit and 64-bit Python versions. This flexibility helps avoid redundant downloads and improves performance in workflows that benefit from cache sharing.

With the proposed change, caches would be segmented by the Python binary architecture (e.g., separate caches for x86 and x86-64), improving accuracy and preventing mismatches. However, this added granularity may lead to cache fragmentation, reduced cache reuse, and increased cold starts, which could impact performance and consistency in certain scenarios.

We appreciate this feedback as we evaluate future enhancements. Please feel free to reach out with any further questions or concerns.

Thanks again for your continued engagement!

aparnajyothi-y avatar Apr 15 '25 05:04 aparnajyothi-y

Hello @alex, Please let us know in case of any concerns/clarifications on the above

aparnajyothi-y avatar Apr 28 '25 13:04 aparnajyothi-y

I disagree with the analysis of the tradeoffs here, but it sounds like you're committed.

On Mon, Apr 28, 2025, 9:47 AM aparnajyothi-y @.***> wrote:

aparnajyothi-y left a comment (actions/setup-python#971) https://github.com/actions/setup-python/issues/971#issuecomment-2835308421

Hello @alex https://github.com/alex, Please let us know in case of any concerns/clarifications on the above

— Reply to this email directly, view it on GitHub https://github.com/actions/setup-python/issues/971#issuecomment-2835308421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBDTWX26HSOYTA7PIWD23YWQPAVCNFSM6AAAAABQU3JFC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMZVGMYDQNBSGE . You are receiving this because you were mentioned.Message ID: @.***>

alex avatar Apr 28 '25 14:04 alex

Hello @alex, thank you for your response and for continuing the conversation.

We appreciate your perspective and understand that you may view the tradeoffs differently. While we’re currently proceeding with caution to avoid regressions in cache performance and reuse, particularly for matrix-based or shared workflows, your points around accuracy and architecture-specific behavior are well taken.

This is certainly an area we’ll continue to evaluate. We're actively monitoring how caching strategies evolve with broader usage patterns, and we'll revisit this discussion as we gather more feedback and data. Your input is valuable to that ongoing process.

Thanks again for your engagement, and please feel free to share any further thoughts or suggestions.

aparnajyothi-y avatar May 21 '25 12:05 aparnajyothi-y