SPORF
SPORF copied to clipboard
Remove redundant split directions before evaluating each of them
The current sampling scheme just randomly places -1s and +1s in the projection matrix. Therefore it is possible to get redundant columns. Evaluating the same split directions multiple times is wasted computation that we can avoid.
You are right about this and I'll relook at the feasibility of changing it. My concern is that this could be an often called and expensive check for something that has a very small probability of occurring and is of little consequence when it does. Before making a final decision on whether to include this, we should 1) look at creating the projection matrix in a way where this can't happen, 2) if we don't find a better way then we will have to implement the check and see how it affects training times, 3) we should check how often duplications like this actually occur in real world datasets.
When calculating feature importance this has been resolved, however that does not fix the issue upstream.