datatools
datatools copied to clipboard
Allow rsample to sample with replacement
Prefer to avoid argparse
dep. Possible to rewrite using the same argument parsing style as the other code?
arpgarse
is part of the standard library so it isn't a dependency, but I agree there is conceptual overhead for non-python programmers and non-programmers - is this what you are concerned about?
Umm, so I was planning to add support for arbitrary distributions here. This was mostly me making room / splitting work into pieces.
I want to do things like:
rsample --distribution normal --mean 100 --stddev 10
The use case being, "I have no idea how strange this graph for my data is, I should see what it looks like with some normal data".
My experience suggests that this will become unreadable without argparse
, and the documentation of argparse is valuable. However, we could split these things off into separate binaries like:
rbinom
rnorm
rpoisson
This has some impacts on documentation / discoverability, but does result in simpler programs that are more readable by non/semi-programmers.
Philosophicaly muttering
Opinions? I have a general misgiving that one might end reimplementing R / numpy with pipes instead of broadcasting. There's a question about what this library represents in the shadow of tools like R and numpy. I mostly like the idea because I am loathe to leave the shell, and am not terribly keen on all the state that comes along with using ipython notebook / babel.
I've hacked up a tool called RPipe
before that works like so
seq 100 | RPipe 'diff(d)' | plot
There's a similar tool called pyline that does a similar thing with python.
Anyway, here's a branch where rsample
selects from a normal distribution. See what you think:
https://github.com/talwrii/datatools/tree/talwrii--normal-data--2016-09-20
- Does this functionality deserve to exist at all (I couldn't find any tools to produce it on the command line)
- Would you prefer this to exist in a separate file called
rnorm
? - In this context, what's your opinion of an argparse dependency
So for a while I had a package of scripts in parallel with datatools
called randtools
. These were about generating random numbers according to distributions, etc. After a while I found myself only using rsample, so I moved that into datatools and dropped randtools.
What this reads to me is that you think randtools would be worthwhile. That's great! It turned out that I didn't need it, but you might, so go and build it (maybe I'll send some PRs!).
There's a question about what this library represents in the shadow of [...]
Yes, I agree. You like datatools for the same reason I do, staying inside the shell. However, R/Python are so good that baking too much into datatools isn't worth it because if what you're doing is complex enough it's better to do it in that context. This is my overriding motivation for keeping datatools
small and focused.
What's your opinion of an argparse dependency?
Not in datatools please.
Cool cool. My motivation for the pull requests is "here's a library for command line data analysis, it doesn't have the tools I want, I shall implement them, now I've implemented them I may as well give you a pull request"
Umm, so I'm going to implement a version of rsample
, possibly with a different name, that generates data from different distributions. I'm assuming you don't want it in datatools
, so will put it in a differently named repro / leave in in my ~/bin
. Just say if you actually want it.
Do you want sampling with replacement in rsample
? If so I'll strip out the argparse
dependency for you.
More generally, I'm probably going to carry on tweaking these tools here and making complementary tools as I go about my day-to-day activities. I don't know how you want to interact with them: your goal of minimality may be at odds with my goal of "create tools for all the things I do"
I could:
- Carry on feeding you pull requests
- Shove stuff in my fork so you can go looting when you feel bored.
- Try to put new tools in a different repro ("moredatatools"!), to avoid the problem of "buggy, more feature-complete fork." Again you could go looting when bored.