Chiyuan Zhang comments

Results 124 comments of


                                            Chiyuan Zhang

About julia files in examples - overfitting

@jskDr Thanks! We have a `DecayOnValidation` learning rate policy that allows one to half the learning rate when the performance on validation set drops.

Is deterministic processing supported?

@droidicus Maybe try disabling the random shuffling of the data layer if you haven't done so yet?

Is deterministic processing supported?

The randomness come mainly from those: - Random initialization of weights - Random shuffling of the data - Random masking of data if you have dropout layer everything else should...

Is deterministic processing supported?

@droidicus I see what is happening now. That is due to Mocha [parameter sharing](http://mochajl.readthedocs.org/en/latest/user-guide/network.html#shared-parameters). The parameter sharing is used to enable sharing of parameters between training and testing networks. So...

Training Imagenet using Mocha

@vollmersj I personally have not done that. For training with huge ImageNet dataset, I believe converting the data into HDF5 format is very infeasible. Actually, IO might become the bottleneck...

Making solvers more general

@the-moliver Thanks for the suggestions. I think `SolverParameters` is designed to hold all general parameters that is needed in all solvers. If a solver needs extra control parameters, I suggest...

Making solvers more general

@the-moliver OK, cool! Thanks! Then I think it's better keep the learning-rate and momentum in general solver (at least for now) as the learning-rate computed there will be your overall_stepsize.

Making solvers more general

@benmoran Thanks! Yes I have been thinking about the interface of the solver. Maybe it could be much easier if we make the `SolverParameter` more general. For example, making it...

Making solvers more general

@benmoran I second this design. Specifically, for `SolverParameter`, I think each solver could provide - a function to initialize a default dictionary with default parameters - a function to check...

Interfacing to Mochas backpropagation algorithm

Unfortunately there is no better way because for obvious reason the gradients are stored separately. I'm quite curious though why do you want to get a flat huge vector? If...