code.pyret.org icon indicating copy to clipboard operation
code.pyret.org copied to clipboard

modes should work for categorical data

Open schanzer opened this issue 1 year ago • 6 comments

In the Statistics package, modes throws an internal error when used with non-numeric data. At the very least, this should be a better error! But more importantly, modes are not restricted to numbers. (This has some implications for our treatment of the topic in Bootstrap:DS -- right now we give a falsely narrow definition simply because Pyret doesn't support the full definition)

schanzer avatar Feb 15 '24 18:02 schanzer

Started looking at this one, some notes so I don't forget when I come back to this:

  • The resolution of this will need to be over in pyret-lang, which defines the statistics package.

  • The interesting bit here is that the definitions are currently typed for numbers and the implementations depend on that for efficiency, so we'll need to:

    • Convert all these methods to be generic or just add a second set of parallel methods for strings. (Do we want to allow over arbitrary types? I don't think Pyret has a notion of generic constraints, and we'd like to constrain this to things where equality is reasonable. I also don't think we want arbitrary mixed types.)
    • convert the implementation of group-and-count, which currently depends on the runtime helper raw_array_sort_nums to an alternative that will work for other types. (We probably still want to use the fast version for numbers, though?)

asolove avatar Mar 12 '24 23:03 asolove

Related discussion over in pyret-lang, mostly around actually enforcing the type constraint and showing a nicer error, rather than extending these to other types: https://github.com/brownplt/pyret-lang/issues/1538

asolove avatar Mar 12 '24 23:03 asolove

@asolove oh wow - really interesting to see that thread. I didn't realize this came up 2 years ago! In that case, maybe the solution is just a written-in-pyret-function that lives in our Data Science library. Would you be willing to write one?

schanzer avatar Mar 13 '24 00:03 schanzer

Yeah, we could definitely do that. Can you point me to the Data Science library?

asolove avatar Mar 13 '24 00:03 asolove

Here's the link - I'm sure I'm not doing the most elegant stuff, so any advice you have on coding quality is most welcome!

schanzer avatar Mar 13 '24 00:03 schanzer

Gonna close this out as the resolution won't be in the CPO codebase. But it's still on my backlog list so I'll write some suggested changes to that file and share with you.

asolove avatar Mar 14 '24 01:03 asolove

Closing this as dupe of https://github.com/brownplt/pyret-lang/issues/1538, since the discussion there is further along.

blerner avatar May 02 '24 13:05 blerner