Ryan Curtin
Ryan Curtin
> By different overload of Evaluate() or Gradient() or EvaluateWithGradient() do you mean the one which works on full dataset. Can you clarify this a bit? Right, exactly. > And...
You're right, I was just making up a code example there. Sorry for the incorrectness.
Also, sorry, I was mixing up linear scan and the full-batch L-BFGS type optimizers! Hmm. That does make this more difficult. Let me think for a little bit about how...
If you can make those into a separate PR, I'm happy to look through and approve it.
All I can think of is the idea of holding a variable `linearScan` in `LMNNFunction`, and using that in `Shuffle()` to decide whether to actually do anything (and computing impostors...
I'll send an email about the final evaluations in just a moment. :) I don't understand how your idea would work---the whole issue is that with some optimizers, `Evaluate()` or...
Yes, we would disable linear scan in the optimizers and only set it in `LMNNFunction`. So any optimizer would believe it was not doing a linear scan and call `Shuffle()`...
I thought also, if we implement this by only computing impostors once at the beginning of each epoch, it will make #1446 a lot easier also.
Hi Manish: > For recomputing at the start of every epoch will require us to access begin of each batch inside shuffle(). I don't understand why this would be true....
Right, I am not too surprised---I would expect some amount of variation depending on the dataset. iris is already very low-dimensional so I would not expect to be able to...