Randy Olson
Randy Olson
From a [Reddit comment](https://www.reddit.com/r/Python/comments/3hu8fj/im_creating_an_example_python_machine_learning/cuaqt9g): > Advice: I think you are missing a few big things like preprocessing/scaling and pipelines. > > Before using the learners, inputs should be scaled so...
Add a section near the end trying to interpret the model: - What features are being used to make the classification? - Why are those features important? - What does...
As discussed [here](https://www.reddit.com/r/dataisbeautiful/comments/8zl2ex/visualizing_street_orientations_anywhere_on_an/e2jznc3/?context=3), the rainbow colormap is not ideal for the data visualization in this project. It is likely better to use a uniform color for the radial histogram.
Currently, the map always defaults to Manhattan. It would be more interesting if the map randomly picked a city to start at (perhaps starting with a [list of the most...
We can support machine learning with text data in TPOT by adding the [CountVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html) and [TfidfVecorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer) to a separate built-in configuration dictionary. I don't think we would need to change...
We currently import several NumPy functions directly, e.g., [here](https://landscape.io/github/EpistasisLab/scikit-rebate/17/modules/skrebate/multisurf.py#L26). Normally this isn't an issue, but `min`, `max`, `mean`, etc. override the standard definitions of these functions in Python. We should...
There are several [minor code quality issues](https://landscape.io/github/EpistasisLab/scikit-rebate/20/messages) in the latest version of the code. For example, several functions are missing docstrings and there are some whitespace issues throughout the code....
For example, the [getDistance](https://landscape.io/github/EpistasisLab/scikit-rebate/17/modules/skrebate/relieff.py#L376) function refers to an `inst` variable that is defined outside of the scope of the function. We should refactor these functions to pass the `inst` variable...
This may not be possible without a significant refactor of the existing code base. As-is, technically the Relief algorithms only store internal pointers to the X and y data arrays.
It looks like Python's Counter module (http://docs.python.org/2/library/collections.html#counter-objects) would simplify some of the code (https://github.com/rhiever/reddit-analysis/blob/master/redditanalysis/__init__.py#L167) by doing the word counting for us. Low priority, but useful to reduce code size.