nix
nix copied to clipboard
Add context to better locate runtime coercions
Add tons of helpful error messages in locations with poor error reporting. This includes mostly type errors. As these are runtime errors, the precise location of the faulty value is usually lost at that time. This translates textually the type expectations (requirements ?) hard-coded in the sources.
TODOs:
- [X] Also annotate coerceToString
- [X] Find and annotate remaining functions bearing expectations on the values. (I guess this is good enough)
- [X] Wrap some function calls with debug info
Fixes #6200 #6191
Have fun by starting with ./outputs/out/bin/nix-instantiate --eval --expr 'builtins.genericClosure false'
and ending with ./outputs/out/bin/nix-instantiate --eval --expr 'builtins.sort (a: b: a.key < b.key) (builtins.genericClosure {startSet = [{key = 1;}]; operator = x: if x.key < 5 then [{key = x.key + 1;} {key = x.key * 2;}] else [];})'
by trying to trigger all the possible type errors in between :-)
:tada:
More examples: https://maudoux.be/changes.html
Well, it has some impact on performance, but that remains under control:
This PR:
nixos.smallContainer.time 0.762153 0.975931 0.736349 0.775949
nixos.kde.time 1.44574 1.6254 1.48982 1.59174
nixos.lapp.time 1.1987 1.3634 1.19679 1.30787
nix-env.qa.time 6.18346 6.71418 6.32466 6.43977
nix-env.qaDrv.time 44.1625 50.7246 53.6632 46.6902
nix-env.qaAggressive.time 6.31606 6.67477 9.45704 6.60786
nix-env.qaDrvAggressive.time 44.3691 48.075 60.2891 45.648
Baseline:
nixos.smallContainer.time 0.71367 0.705955 0.71822 0.715239
nixos.kde.time 1.37252 1.40253 1.47962 1.38685
nixos.lapp.time 1.12598 1.14188 1.22673 1.12473
nix-env.qa.time 6.12947 6.16216 6.56867 6.22066
nix-env.qaDrv.time 42.1374 42.1028 54.5509 42.0571
nix-env.qaAggressive.time 6.79885 6.00412 9.02374 6.36788
nix-env.qaDrvAggressive.time 45.5595 42.1364 55.0709 42.3918
Do you want to update TODOs, mark this draft, or save things for future PRs?
Did you consider using the existing call stack stuff?
Relatedly, it would be nice to use the positions of the subterm that is trying to be casted. But I see that was discussed at https://github.com/NixOS/nix/issues/6200
I would not be opposed to merging this as-is, and circling back to these issues, but others may disagree.
Do you want to update TODOs, mark this draft, or save things for future PRs?
I guess this is as good as I can achieve in reasonable time.
Did you consider using the existing call stack stuff?
Well, first I thought it would not work. Now I am not too sure anymore. It should be feasible, but it requires shifting around several pos
and rework how we handle them. It is a bit more involved, and the try/catch syntax adds lots of clutter. Maybe recent C++ lambdas may alleviate this issue. Apparently the try{}catch may have zero cost when not used. That would definitely be the right way forward if it is the case.
So, yes, I think it could be superior if well used. It would certainly avoid mismatch between error messages and positions, as we have now.
That being said, the current version has the advantage that it requires any user of these functions to provide an explanation string, which ensures that we do not inadvertently omit them.
Relatedly, it would be nice to use the positions of the subterm that is trying to be casted. But I see that was discussed at #6200
It is extremely complex in nix, as often there is no good position for values. We do keep some positions for the keys of attribute sets, and sometimes for functions, when feasible.
For example
let
list = [ 1 "a" {attr="value";} ];
fun = builtins.map (x: x+1);
afterList = builtins.seq list;
getThird = l: builtins.elemAt l 2;
in
afterList (getThird (fun list ++ [3]))
fails with
1 error: cannot coerce a set to a string
2
3 at /home/layus/projects/nix/complex.nix:4:26:
4
5 3| list = [ 1 "a" {attr="value";} ];
6 4| fun = builtins.map (x: x+1);
7 | ^
8 5| afterList = builtins.seq list;
9
10 … while evaluating anonymous lambda
11
12 at /home/layus/projects/nix/complex.nix:4:23:
13
14 3| list = [ 1 "a" {attr="value";} ];
15 4| fun = builtins.map (x: x+1);
16 | ^
17 5| afterList = builtins.seq list;
18
19 … from call site
20
21 … while evaluating 'getThird'
22
23 at /home/layus/projects/nix/complex.nix:6:14:
24
25 5| afterList = builtins.seq list;
26 6| getThird = l: builtins.elemAt l 2;
27 | ^
28 7| in
29
30 … from call site
31
32 at /home/layus/projects/nix/complex.nix:8:14:
33
34 7| in
35 8| afterList (getThird (fun list ++ [3]))
36 | ^
37 9|
And nowhere is there any mention of the list itself, nor the value that failed to be a proper int. And that would be difficult. Would you point list ++ [3]
, list = [ 1 "a" {attr="value";} ]
, [ 1 "a" {attr="value";} ]
, {attr="value";}
, or something else ?
:bulb: One thing we could do is print the fautly values to help users get a ffeling of what happens.
I would not be opposed to merging this as-is, and circling back to these issues, but others may disagree.
I think it is ready to merge. Better merge early and improve often :-).
So, yes, I think it could be superior if well used. It would certainly avoid mismatch between error messages and positions, as we have now.
That being said, the current version has the advantage that it requires any user of these functions to provide an explanation string, which ensures that we do not inadvertently omit them.
Yeah, I think the cast functions should take the argument, and do the try-catch themselves on behalf of the caller. That still ensures one doesn't forget to provide a reason for the cast!
Very nice looking!
Except for painful conflicts... I guess this will have to wait the hackathon tomorrow.
@Ericson2314 Not sure I will be able to join the hacking session, but if you could find some time to review the changes, that would be much appreciated. I am quite happy with the result, but I lack real tests, and there are so many changes that typos can always slip in.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/tweag-nix-dev-update-26/18252/1
@Ericson2314 https://github.com/NixOS/nix/pull/6204/commits/963b8aa39b169bf5c054449ddce39d60faacf298 just for you :-). I hope you like it.
would definitely be interesting to have another look at the performance impact now. not having gone over it again completely this doesn't look like it should cause performance regressions, so if it does there may still be room for optimizations.
Suggestions from NixUX:
-
'--show-trace
can become overwhelming, even more than today. Perhaps respect
-v` verbosity flags? -
New messages can be a less severe level in order to limit the amount displayed by default.
-
consider a new PR or doing it in the current PR
-
It is hard to keep error message testing up to date.
-
performance review @pennae?
we've run this PR through our perf benchmarks, and while we do see an increase in instructions executed the real-world impact on walltime is in the noise. turning the string_view refs into plain string_views brings measurements even closer together.
performance-wise this looks pretty good. :)
From NixUX: Thanks @pennae for the perf review. This still requires a way to control and limit the severity of the trace.
Suggestions from NixUX:
* '--show-trace`can become overwhelming, even more than today. Perhaps respect`-v` verbosity flags?
I am not sure this is true. Do we have examples ? The full trace is even bigger, but now it should be readable, with errors pointing at precise locations. See https://gist.github.com/layus/0c4203d41a67326367caf2086f20a2ab#file-basic-diff for an example where the added information allows to understand the error location.
https://gist.github.com/layus/0c4203d41a67326367caf2086f20a2ab#file-poetry-diff shows an example where the trace is made way bigger, and potentially unnecessarily verbose.
I still do not see what this PR should do about it. We had a situation where plain misleading messages were printed. Not it gets easier for newcomers to understand the errors and the required types they need in simple cases. Complex cases become more verbose, but also more easy to understand and to follow, as the increment between the trace steps is smaller. Less wild jumps in the code between two seemingly unrelated parts.
And by default you get the first parts of the trace without needing --show-trace. Meaning that if you really, really want to see the full trace, you should use more or less (pun intended).
I would argue that we should prefer correctness and precision over terseness. This PR cannot reasonably bundle a solution to the very complex issue of printing a meaningful trace in a lazy language.
* New messages can be a less severe level in order to limit the amount displayed by default.
I would argue that the old messages should be less chatty ;-). Just kidding, but as above, the code could benefit from a redesign of the traces to show another projection of the full trace. Maybe expressions trace until the first function scope, and then only the function calls stack ? That is way out of context for this PR I think.
* consider a new PR or doing it in the current PR
:+1: for new PR.
* It is hard to keep error message testing up to date.
It depends on what you want to test. This PR made it sure that more error cases are documented via a mandatory text explanation. Maybe that is the "testing" we need. Any other testing is going to be brittle with respect to format changes.
* performance review @pennae?
Thanks @pennae !
I will merge Eelco's comments, and rebase onto master. Wait for it :-).
That is it for today. I think it should work now @bburdette :tada: :yum:
See https://gist.github.com/layus/e10b4fab9938e9a1777530fe296f0623 for an example on how it behaves with a nixos module error.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/tweag-nix-dev-update-35/21701/1
Triaged in Nix team meeting on 2022-11-25:
- @thufschmitt: decided some time ago that we want to have it. @layus fixed merge conflicts, but a trivial one is still left
- @edolstra: would this cause a major merge conflict with lazy trees?
- @thufschmitt: have some some thoughts on that, but should discuss those in a different context
-> assigned to @edolstra
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2022-11-25-nix-team-meeting-minutes-11/23601/1
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2022-12-09-nix-team-meeting-minutes-15/23951/1
@layus Merged, thanks!
Unfortunately this PR has caused a regression in 2.13.0.
Symptoms include infinite recursions and some sort of regression in the module system. I've tried to find the root cause by
- bisecting, but the git history of this change isn't conducive to it
- reading through the change for two potential causes
- accidentally forcing more evaluation than necessary
- memory corruption
I haven't managed to find it, so for the time being, I've reverted this change. Any help would be much appreciated, as we'd like to have this change back, both because it is a very welcome change and because other PRs would not to have conflicts resolved if we don't manage to merge this soon.
Do we have any test cases for these issues? I think a PR with the "revert of the revert" + some failing tests would be a good starting point, and then hopefully @layus or another person can pick up the baton from there.
Hi, having a look right now. There are plenty of potential fixes. Finding failing tests would be tremendously useful indeed. Any help welcome. For now I am focusing on reproducing the nixpkgs-lib-tests failures.
@layus so far I have not seen a behavioral regression, only a change (and improvement in the error message) in the error message output. I added a reproducer to more quickly try out Nix versions: https://github.com/NixOS/nix/pull/7621#issuecomment-1387718654
So first there is indeed a change where I introduced "frame" traces (pardon the naming here). Think of them as more important trace points, like function calls and builtins.derivation calls. This explains why the addErrorTrace thing is invisible: it happens after a function call in the trace. I have confirmed that removing this notion of frame traces fixes the nixpkgs tests errors. There are two solutions
- disable frame traces
- make addErrorTrace a frame trace.
I think 2 is better because we keep the terseness of the output, and it even makes sense, as a message that a human took time to write for debugging purposes deserves to be printed in the end :-). Working on it.