style-analyzer
style-analyzer copied to clipboard
Improve feature selection process
Context: https://github.com/src-d/style-analyzer/issues/595#issuecomment-466353578
Things we want to try:
- [ ] preselect features by hands
- [ ] add Feature selection to GridSearch
- [ ] feature agglomeration
- [ ] Give a try https://www.featuretools.com/
- [ ] Train feature selection only once per repo (pin selection for the next runs)
For the last point I was more thinking about selecting over a set of training repos (for example by taking the top 500 most selected features over all the repos selections) but indeed it might also be good to try this version.