Chiyuan Zhang

Results 124 comments of Chiyuan Zhang

@jskDr Thanks! We have a `DecayOnValidation` learning rate policy that allows one to half the learning rate when the performance on validation set drops.

@droidicus Maybe try disabling the random shuffling of the data layer if you haven't done so yet?

The randomness come mainly from those: - Random initialization of weights - Random shuffling of the data - Random masking of data if you have dropout layer everything else should...

@droidicus I see what is happening now. That is due to Mocha [parameter sharing](http://mochajl.readthedocs.org/en/latest/user-guide/network.html#shared-parameters). The parameter sharing is used to enable sharing of parameters between training and testing networks. So...

@vollmersj I personally have not done that. For training with huge ImageNet dataset, I believe converting the data into HDF5 format is very infeasible. Actually, IO might become the bottleneck...

@the-moliver Thanks for the suggestions. I think `SolverParameters` is designed to hold all general parameters that is needed in all solvers. If a solver needs extra control parameters, I suggest...

@the-moliver OK, cool! Thanks! Then I think it's better keep the learning-rate and momentum in general solver (at least for now) as the learning-rate computed there will be your overall_stepsize.

@benmoran Thanks! Yes I have been thinking about the interface of the solver. Maybe it could be much easier if we make the `SolverParameter` more general. For example, making it...

@benmoran I second this design. Specifically, for `SolverParameter`, I think each solver could provide - a function to initialize a default dictionary with default parameters - a function to check...

Unfortunately there is no better way because for obvious reason the gradients are stored separately. I'm quite curious though why do you want to get a flat huge vector? If...