zig icon indicating copy to clipboard operation
zig copied to clipboard

zig build: add env_map entries to hash for Step.Run

Open dweiller opened this issue 1 year ago • 1 comments

This change fixes false-positive cache hits for run steps that get run with different sets of environment variables due the the environment map being excluded from the cache hash.

I assume the exclusion of the env_map field from the cache hash of a Step.Run was an oversight as it is important for use cases where environment variables are used to control program behaviour (I encountered this issue when using AFLPlusPlus where environment variables are used to enable different instrumentation options). If it was intentional I can open an issue to discuss changing this behaviour.

The implementation in this PR sorts the environment variables by the variable name to ensure that any reordering does not cause a cache miss. I have not implemented special-case logic for windows to normalise environment variable name case before hashing, so you can get cache misses if different codes paths in build.zig add the same environment variable (and value) but use different casing for the variable name.

dweiller avatar Jul 30 '24 09:07 dweiller

still wrong... please use std.mem.sortUnstable

Done.

dweiller avatar Aug 02 '24 03:08 dweiller

For Windows, I believe you'd want to do something like:

Before the kv_pairs.appendAssumeCapacity:

  • Use std.unicode.Wtf8Iterator to iterate over the key's code points (or maybe code point slices?)
  • If the code point is <= std.math.maxInt(u16), then:
    • Call std.os.windows.ntdll.RtlUpcaseUnicodeChar on it
    • Call std.unicode.wtf8Encode with a buf of 3 bytes (RtlUpcaseUnicodeChar returns a u16 which means the code point can always be encoded as 3 WTF-8 bytes)
    • Add the uppercased WTF-8 bytes to the normalized key buffer
  • Else just add the code point's bytes to the normalized key buffer, Windows doesn't have uppercase data for code points > std.math.maxInt(u16)
  • Append the normalized key and the (verbatim) value

(but it seems like a quite rare situation where this normalization would matter if I'm thinking about it right)

squeek502 avatar Mar 26 '25 12:03 squeek502

Yeah, I don't think this necessarily needs to block the PR considering how much of an edge case it is. IIUC, it seems like the worst case scenario is that two otherwise identical commands using e.g. FOO=bar and foo=bar would be run and cached separately?

I think it'd be fine to merge this as-is and leave the Windows fix as follow-up work, but I'll let @dweiller decide if he feels like doing it in this PR.

alexrp avatar Mar 26 '25 12:03 alexrp

IIUC, it seems like the worst case scenario is that two otherwise identical commands using e.g. FOO=bar and foo=bar would be run and cached separately?

Yes, that would be what happens without normalisation.

I think it'd be fine to merge this as-is and leave the Windows fix as follow-up work, but I'll let @dweiller decide if he feels like doing it in this PR.

I'd rather leave normalisation on Windows to follow-up work if possible as I'm not familiar with Windows path name details (and wouldn't be able to test it locally). While I don't think it's ideal to not have normalisation I expect it be be very rare someone would hit it, and it is at least possible to work around in build.zig by making sure they always use the same case (assuming they notice the cache misses and figure out why it's happening). Note that if anyone hits this edge case, their build.zig is also probably broken on other operating systems unless their inconsistent case logic only gets run on Windows.

dweiller avatar Mar 26 '25 13:03 dweiller

Follow-up filed at #23366.

alexrp avatar Mar 26 '25 13:03 alexrp