Ryan Curtin comments

Results 312 comments of


                                            Ryan Curtin

LMNN: don't recompute impostors during an SGD batch

> By different overload of Evaluate() or Gradient() or EvaluateWithGradient() do you mean the one which works on full dataset. Can you clarify this a bit? Right, exactly. > And...

LMNN: don't recompute impostors during an SGD batch

You're right, I was just making up a code example there. Sorry for the incorrectness.

LMNN: don't recompute impostors during an SGD batch

Also, sorry, I was mixing up linear scan and the full-batch L-BFGS type optimizers! Hmm. That does make this more difficult. Let me think for a little bit about how...

LMNN: don't recompute impostors during an SGD batch

If you can make those into a separate PR, I'm happy to look through and approve it.

LMNN: don't recompute impostors during an SGD batch

All I can think of is the idea of holding a variable `linearScan` in `LMNNFunction`, and using that in `Shuffle()` to decide whether to actually do anything (and computing impostors...

LMNN: don't recompute impostors during an SGD batch

I'll send an email about the final evaluations in just a moment. :) I don't understand how your idea would work---the whole issue is that with some optimizers, `Evaluate()` or...

LMNN: don't recompute impostors during an SGD batch

Yes, we would disable linear scan in the optimizers and only set it in `LMNNFunction`. So any optimizer would believe it was not doing a linear scan and call `Shuffle()`...

LMNN: don't recompute impostors during an SGD batch

I thought also, if we implement this by only computing impostors once at the beginning of each epoch, it will make #1446 a lot easier also.

LMNN: don't recompute impostors during an SGD batch

Hi Manish: > For recomputing at the start of every epoch will require us to access begin of each batch inside shuffle(). I don't understand why this would be true....

LMNN: low-rank matrix optimization

Right, I am not too surprised---I would expect some amount of variation depending on the dataset. iris is already very low-dimensional so I would not expect to be able to...