pyret-lang
pyret-lang copied to clipboard
Internal error when misusing statistics.modes
I had a list of strings, and tried to use modes on it (before realizing that modes only works on List<Number>). I got
Internal errors prevented this error message from being shown. Please report this as a bug.
Code to reproduce:
import statistics as S
C = [list: "a", "b", "a", "c"]
S.modes(C)
That's annoying -- this is an unfortunate interaction between being annotated with List<Number> (which modes is) and the fact that our dynamic annotation checkers don't check element types (because it's O(n) expensive), and that builtins.raw-array-sort-nums assumes its arguments really are numbers, which is what causes the error.
@jpolitz this paper-cut issue has been on my todo list for a while. I'm not sure what the best fix is. One approach is to change the implementation of builtins.raw-array-sort-nums to check that its argument is a RawArrayOfNumbers (for some new primitive annotation that confirms it's been given an array and then examines all the items in that array). Another is to change the statistics library to do the checking. On the one hand: we ought to not have a function that can crash like this, which implies that raw-array-sort-nums needs to sanitize its input. On the other: we ought to provide a stack trace in the error message that blames the argument given to S.modes directly, rather than be hidden deeper in some implementation detail. On the third hand: we don't want to change how polymorphic annotations are compiled to generated code, because we don't want to do the greater-than-O(1) work in general. What's your preference here?
Compiling parametric annotations differently is a non-starter for a lot of reasons.
I think having raw-array-sort-nums catch this is the right thing to do first, with an up-front isNumber check across the elements.
Then if the error from modes is bad we can add more checking in statistics.
On Sun, Apr 25, 2021 at 8:53 AM, Ben Lerner < @.*** > wrote:
@ jpolitz ( https://github.com/jpolitz ) this paper-cut issue has been on my todo list for a while. I'm not sure what the best fix is. One approach is to change the implementation of builtins.raw-array-sort-nums to check that its argument is a RawArrayOfNumbers (for some new primitive annotation that confirms it's been given an array and then examines all the items in that array). Another is to change the statistics library to do the checking. On the one hand: we ought to not have a function that can crash like this, which implies that raw-array-sort-nums needs to sanitize its input. On the other: we ought to provide a stack trace in the error message that blames the argument given to S.modes directly, rather than be hidden deeper in some implementation detail. On the third hand: we don't want to change how polymorphic annotations are compiled to generated code, because we don't want to do the greater-than-O(1) work in general. What's your preference here?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/brownplt/pyret-lang/issues/1538#issuecomment-826346379 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAA5IU46C4V2UMXAZBEN2HDTKQ3IJANCNFSM4S3VMHXQ ).
We have some other places in libraries where we have crummy output due to lack of runtime type-testing of the contents of lists, e.g. color-list-to-bitmap([list: 1, 2, 3, 4], 2, 2) produces "Evaluating a field lookup in 1", rather than "You didn't give me a list of colors!" I suspect there are other uses of List<Number> in other libraries, though I can't think of them right this second.
I do wonder if e.g. we should write S.modes as
all-nums = L.all(is-number, _)
fun modes(ls :: List<Number>%(all-nums)): ... end
This would give us the error message
as opposed to (for some made-up body of modes that used +)
The first message is definitely better than the second, though it's still not ideal (since it reveals the refinement syntax itself, even when List<Number> ought to (appear to) be enough).
Mmph. For the current state of things, either showing the refinement or defining “type NumberList = List<Number>%(all-nums)" is probably best here. Frustrating but probably best, and lets libraries choose how much to enforce.
On Thu, Apr 29, 2021 at 2:25 PM, Ben Lerner < @.*** > wrote:
We have some other places in libraries where we have crummy output due to lack of runtime type-testing of the contents of lists, e.g. color-list-to-bitmap([list: 1, 2, 3, 4], 2, 2) produces "Evaluating a field lookup in errored. It was given a non-object value: 1 ", rather than "You didn't give me a list of colors!" I suspect there are other uses of List<Number> in other libraries, though I can't think of them right this second.
I do wonder if e.g. we should write S.modes as
all-nums = L.all(is-number, _) fun modes(ls :: List<Number>%(all-nums)): ... end
This would give us the error message image ( https://user-images.githubusercontent.com/918464/116620331-ac954780-a90f-11eb-842d-482b1624eac7.png ) as opposed to (for some made-up body of modes that used + ) image ( https://user-images.githubusercontent.com/918464/116620355-b8810980-a90f-11eb-8523-b682c95468e0.png ) The first message is definitely better than the second, though it's still not ideal (since it reveals the refinement syntax itself, even when List<Number> ought to (appear to) be enough).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/brownplt/pyret-lang/issues/1538#issuecomment-829606739 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAA5IU4TVZBU7RONLTTGUDDTLHFFXANCNFSM4S3VMHXQ ).
Noting that according to our resident stats professor, modes actually should work with categorical data. It's strictly returning the frequency of values, so orderability is not a required property of list elements.
This is easy to do, IF you're willing to limit it to lists of numbers or lists of strings. The only place where we care about number-ness is in the grouping function, so that we can aggregate values quickly. If values are strings then we can use string-dicts instead for the same effort. If you want other kinds of data to have modes, then we have potential problems, and we have to design that very carefully -- at best, it'll be a much slower algorithm.
For argument's sake, I've drafted that implementation in a PR, so we can look at it.
(Also note that I did not generalize mode-largest or mode-smallest to work on strings; they're still numbers only.)
@blerner strings would get us 95% of the way there. And I promise not to even ask about the last 5% until 2025 at the earliest