JSAT
JSAT copied to clipboard
FR: getParamsFromMethods - Could this be annotations?
This seems perfect for annotations:
@TunableParameter(
name="RBFKernel_Sigma",
minValue=0.001,
maxValue=2_000,
startValue=1832,
tunePriority=TunableParameterPriority.HIGH
)
I'm not sure annotations make sense, as some times the min/max values will be dependent upon the data. The RBF Kernel is actually a good example of that if you look at the code (there is a guessSigma method that returns a distribution to search over for the value of Sigma).
I do like the idea of a tuning priority though. I think it will take some thought on how that should be integrated. Maybe just an extra parameter when auto-populating tunable values?
I was over-eager with the min/max, but doesn't the RandomSearch need some sort of constraints?
My high-level FR 0: the code in getParamsFromMethods(final Object obj, String prefix)
uses string parsing of the method names, which seems more fragile than expressly tagging tunable parameters with an annotation, and is harder to search the code for tunable algorithms. This is after I got all excited about auto-tune, and then ran into "This model doesn't seem to have any easy to tune parameters" over and over, and then tried to see which were tunable.
Add-on 1: If a class is declaring parameters as tunable, the docs often say "you should bother adjusting A and B, don't worry about C unless you have a very odd case" which is the motivator for the priority ranking in the annotation. If it already have the params declared w/ annotations, it feels like tagging a (default MEDIUM) priority to LOW or HIGH would be easy.
Add-on 2: Sane boundaries also communicate knowledge AND could enhance the effectiveness of the RandomSearch. Should the abc value be from 0 to 1? 1 to 1000? 0 to 1 but really almost always 0 to 0.001? heckifIknow.
You have to be doing something similar already, I was tracing getParam-getGuess-guessMethod-invoke but got a bit lost. Maybe one really can't guess without looking at the data?
But you do - baseLearner = new RandomDecisionTree(1, Integer.MAX_VALUE, 3, TreePruner.PruningMethod.NONE, 1e-15);
seems like a decent guess, and that "testProportion=1e-15" came from some valuable knowledge in your head.
but doesn't the RandomSearch need some sort of constraints?
RandomSearch needs a distribution. Which may or may not just be uniform. The current framework always returns a distribution object by default. GridSearch then uses quantiles from that distribution.
This is after I got all excited about auto-tune, and then ran into "This model doesn't seem to have any easy to tune parameters" over and over, and then tried to see which were tunable.
Any thoughts on how to make it easier to search? Some of it is compositional though. Like any algorithm that takes the Kernel Trick will have different parameters depending on the kernel given.
Sane boundaries also communicate knowledge AND could enhance the effectiveness of the RandomSearch. Should the abc value be from 0 to 1? 1 to 1000? 0 to 1 but really almost always 0 to 0.001? heckifIknow.
See above, RandomSearch doesn't need boundaries - it needs a distribution. And the values can change depending on the data. I try to put what the value range is in the documentation. I think if you are going to set them yourself, you should be reading up on the algorithm. Otherwise just trust the auto-fill defaults.
You have to be doing something similar already, I was tracing getParam-getGuess-guessMethod-invoke but got a bit lost. Maybe one really can't guess without looking at the data?
Depends on the algorithm :-/
seems like a decent guess, and that "testProportion=1e-15" came from some valuable knowledge in your head.
That was actually an implementation detail because earlier versions didn't allow 0. Has nothing to do with what parameters you should try. In fact, for RF you should never make that value larger.
Understood. Hummm... maybe it is as easy as two functions, or examples on how to emulate two functions, the facetiously named
- willThisAlgoEverBeOptimizableRegardlessOfCurrentData()
- isThisAlgoOptimizableGivenTheDataSetAndKernelAndOtherStuffItCurrentlyHas()
Where #1 makes it easy to search the code, and #2 is the current "you don't know until you get there" reality.
Hmm. Are you more interested in finding the tunable parameters themselves, or just the algorithms that have some?
Could be just a annotation with no code meaning. "@Tunable". Just lets you know that the object has parameters to tune.
That would be excellent. It is what I kinda assumed "implements Parameterized" indicated, but now I'm thinking I was reading too much into it.
Do you have a usecase where you wanted/need these annotations at runtime, or purely to make it easier to search through the docs?
I wanted them at runtime to see if it was worth passing the algo through a RandomSearch to try to improve it.