Preston Parry issues

Results 154 issues of


                                            Preston Parry

idea: randomly shuffling the input data

slide 18 says vw, libffm and nn are order dependent, so shuffling the data will give you better results: http://www.slideshare.net/odsc/owen-zhangopen-sourcetoolsanddscompetitions1

programmatically choose n_iter for randomizedSearchCV based on size of n

have it make a rough guess to shoot for 8 hours of training time. but make that training time variable super obvious, so people can tweak it themselves. obviously, this...

knn blows up huge amounts of memory with sparse matrices

without sparse matrices, knn trained super quickly. with sparse matrices, each python instance training a KNN (presumably, each of the sub-processes spun out by GridSearchCV), takes up up to 12...

add in some informative and humorous logging

i'd love to educate people about what's actually going on make this available Easter-egg style, only if someone passes in "whisperGeekyNothings"

feauture request: "does my newly engineered feature matter?"

let users quickly figure out if their new feature is highly predictive or not it would just give directional guidance it would go off and quickly train a lasso (probably...

refactor paramMakers to not use duplicate files when the split is determined by makeClassifiers

right now we have two different paramMakers for random forests, despite the fact that they are returning the same thing. have both key names point to the same file, so...

include a contributing file with instructions on how to develop

stuff like --devKaggle, instructions on how to download and name the right kaggle files, etc.

prepare a 1.0 release

as soon as the docs are updated, make a 1.0 release. 2.0 will be once i've got more classifiers trained. this includes training the neural network for a considerably longer...

A duplicate of https://github.com/rsteca/sklearn-deap/issues/27 but hopefully easier to understand with a reproducible code block. to reproduce, simply set `generations_number=1` in the `test.ipynb` notebook. when you do, you'll see the following:...

Add More Companies

Presumably all of the companies mentioned here have open data: http://fortune.com/2015/07/30/tech-companies-diveristy/ Thanks for pulling together such an awesome site!