datatools icon indicating copy to clipboard operation
datatools copied to clipboard

Allow rsample to sample with replacement

Open talwrii opened this issue 8 years ago • 5 comments

talwrii avatar Sep 10 '16 05:09 talwrii

Prefer to avoid argparse dep. Possible to rewrite using the same argument parsing style as the other code?

bagrow avatar Sep 12 '16 13:09 bagrow

arpgarse is part of the standard library so it isn't a dependency, but I agree there is conceptual overhead for non-python programmers and non-programmers - is this what you are concerned about?

Umm, so I was planning to add support for arbitrary distributions here. This was mostly me making room / splitting work into pieces.

I want to do things like:

rsample --distribution normal --mean 100 --stddev 10

The use case being, "I have no idea how strange this graph for my data is, I should see what it looks like with some normal data".

My experience suggests that this will become unreadable without argparse, and the documentation of argparse is valuable. However, we could split these things off into separate binaries like:

rbinom rnorm rpoisson

This has some impacts on documentation / discoverability, but does result in simpler programs that are more readable by non/semi-programmers.

Philosophicaly muttering

Opinions? I have a general misgiving that one might end reimplementing R / numpy with pipes instead of broadcasting. There's a question about what this library represents in the shadow of tools like R and numpy. I mostly like the idea because I am loathe to leave the shell, and am not terribly keen on all the state that comes along with using ipython notebook / babel.

I've hacked up a tool called RPipe before that works like so

seq 100 | RPipe 'diff(d)' | plot 

There's a similar tool called pyline that does a similar thing with python.

talwrii avatar Sep 12 '16 19:09 talwrii

Anyway, here's a branch where rsample selects from a normal distribution. See what you think:

https://github.com/talwrii/datatools/tree/talwrii--normal-data--2016-09-20

  • Does this functionality deserve to exist at all (I couldn't find any tools to produce it on the command line)
  • Would you prefer this to exist in a separate file called rnorm?
  • In this context, what's your opinion of an argparse dependency

talwrii avatar Sep 20 '16 14:09 talwrii

So for a while I had a package of scripts in parallel with datatools called randtools. These were about generating random numbers according to distributions, etc. After a while I found myself only using rsample, so I moved that into datatools and dropped randtools.

What this reads to me is that you think randtools would be worthwhile. That's great! It turned out that I didn't need it, but you might, so go and build it (maybe I'll send some PRs!).

There's a question about what this library represents in the shadow of [...]

Yes, I agree. You like datatools for the same reason I do, staying inside the shell. However, R/Python are so good that baking too much into datatools isn't worth it because if what you're doing is complex enough it's better to do it in that context. This is my overriding motivation for keeping datatools small and focused.

What's your opinion of an argparse dependency?

Not in datatools please.

bagrow avatar Sep 20 '16 23:09 bagrow

Cool cool. My motivation for the pull requests is "here's a library for command line data analysis, it doesn't have the tools I want, I shall implement them, now I've implemented them I may as well give you a pull request"

Umm, so I'm going to implement a version of rsample, possibly with a different name, that generates data from different distributions. I'm assuming you don't want it in datatools, so will put it in a differently named repro / leave in in my ~/bin. Just say if you actually want it.

Do you want sampling with replacement in rsample? If so I'll strip out the argparse dependency for you.

More generally, I'm probably going to carry on tweaking these tools here and making complementary tools as I go about my day-to-day activities. I don't know how you want to interact with them: your goal of minimality may be at odds with my goal of "create tools for all the things I do"

I could:

  • Carry on feeding you pull requests
  • Shove stuff in my fork so you can go looting when you feel bored.
  • Try to put new tools in a different repro ("moredatatools"!), to avoid the problem of "buggy, more feature-complete fork." Again you could go looting when bored.

talwrii avatar Sep 21 '16 00:09 talwrii