machine-learning
machine-learning copied to clipboard
Research methods to normalize, and smooth data
We need to research preprocessing methodolgies to normalize, and smooth data. Specifically, we need to determine if we can automate the preprocess to normalize, and smooth data. Also, we need to determine if multiple approaches can be implemented simultaneously, before too much data is thrown away.
Ultimately, we need to determine which datapoints are outliers, and generally bad. Then, instead of removing the corresponding datapoints from the SQL database, we would mark it with the table column. So, we would need to create an additional sql column, which would mark a specific datapoint as bad. This would allow users to decide whether to include bad datapoints when generating the corresponding model.
The following is a lazy googling result suggesting against using PCA for SVM's:
- https://www.quora.com/Is-it-worth-trying-PCA-on-your-data-before-feeding-to-SVM
We are removing this issue from milestone 0.4 for similar reasons as #2297.