zig
zig copied to clipboard
zig build: add env_map entries to hash for Step.Run
This change fixes false-positive cache hits for run steps that get run with different sets of environment variables due the the environment map being excluded from the cache hash.
I assume the exclusion of the env_map field from the cache hash of a Step.Run was an oversight as it is important for use cases where environment variables are used to control program behaviour (I encountered this issue when using AFLPlusPlus where environment variables are used to enable different instrumentation options). If it was intentional I can open an issue to discuss changing this behaviour.
The implementation in this PR sorts the environment variables by the variable name to ensure that any reordering does not cause a cache miss. I have not implemented special-case logic for windows to normalise environment variable name case before hashing, so you can get cache misses if different codes paths in build.zig add the same environment variable (and value) but use different casing for the variable name.
still wrong... please use
std.mem.sortUnstable
Done.
For Windows, I believe you'd want to do something like:
Before the kv_pairs.appendAssumeCapacity:
- Use
std.unicode.Wtf8Iteratorto iterate over the key's code points (or maybe code point slices?) - If the code point is <=
std.math.maxInt(u16), then:- Call
std.os.windows.ntdll.RtlUpcaseUnicodeCharon it - Call
std.unicode.wtf8Encodewith a buf of 3 bytes (RtlUpcaseUnicodeCharreturns a u16 which means the code point can always be encoded as 3 WTF-8 bytes) - Add the uppercased WTF-8 bytes to the normalized key buffer
- Call
- Else just add the code point's bytes to the normalized key buffer, Windows doesn't have uppercase data for code points >
std.math.maxInt(u16) - Append the normalized key and the (verbatim) value
(but it seems like a quite rare situation where this normalization would matter if I'm thinking about it right)
Yeah, I don't think this necessarily needs to block the PR considering how much of an edge case it is. IIUC, it seems like the worst case scenario is that two otherwise identical commands using e.g. FOO=bar and foo=bar would be run and cached separately?
I think it'd be fine to merge this as-is and leave the Windows fix as follow-up work, but I'll let @dweiller decide if he feels like doing it in this PR.
IIUC, it seems like the worst case scenario is that two otherwise identical commands using e.g.
FOO=barandfoo=barwould be run and cached separately?
Yes, that would be what happens without normalisation.
I think it'd be fine to merge this as-is and leave the Windows fix as follow-up work, but I'll let @dweiller decide if he feels like doing it in this PR.
I'd rather leave normalisation on Windows to follow-up work if possible as I'm not familiar with Windows path name details (and wouldn't be able to test it locally). While I don't think it's ideal to not have normalisation I expect it be be very rare someone would hit it, and it is at least possible to work around in build.zig by making sure they always use the same case (assuming they notice the cache misses and figure out why it's happening). Note that if anyone hits this edge case, their build.zig is also probably broken on other operating systems unless their inconsistent case logic only gets run on Windows.
Follow-up filed at #23366.