Dustin Tran

dustintran.com [email protected]

Google DeepMind San Francisco, CA

Results 118 comments of


                                            Dustin Tran

Plot diagnostics

For some idea of what the visuals look like, I quite like the stuff in http://arxiv.org/abs/1206.1106 for example (not particularly, just an arbitrary paper I chose). i.e. high resolution fonts,...

Plot diagnostics

picture of current progress ![screen shot 2015-05-31 at 11 18 23 am](https://cloud.githubusercontent.com/assets/2569867/7902404/cd740b5c-0786-11e5-9478-7e3a9b3e48ff.png) bugs/things to continue working on: - sgd gives nonsensical prediction results (could be a result of bad learning...

Plot diagnostics

Progress: ![screen shot 2015-05-31 at 5 03 19 pm](https://cloud.githubusercontent.com/assets/2569867/7903853/08540080-07b7-11e5-8934-b3d71063d650.png) 1. was definitely just a problem of setting the hyperparameter `alpha` in the Xu's learning rate. This also still needs to...

AdaGrad vs d-dim

I generalized the `d`-dimensional learning rate `D_n` to have hyperparameters `α` and `c`: ``` I_hat = α*I_hat + diag(I_hat_new) D_n = 1/(I_hat)^c ``` The observed Fisher information `I_hat` is the...

AdaGrad vs d-dim

Yup. ``` R library(sgd) # Dimensions N

AdaGrad vs d-dim

I've been trying to dig into the theory and am thoroughly perplexed. The paper looks at minimizing the regret function using the Mahalanobis norm, which generalizes L2. That is, we...

AdaGrad vs d-dim

Yup, would definitely be interesting to see. That is, we check the variance of the two estimates as `n -> infty` through a plot

AdaGrad vs d-dim

As a reminder (to self), this was looked at and briefly mentioned in the current draft for the NIPS submission. The intuition behind why AdaGrad leads to better empirical performance...

SGD with arbitrary function for likelihood and gradient

You can look at the method of moments example in the repo. It implements a gradient function which is passed into SGD. This can be useful for simple prototyping, bu...

restructure and add code for iteratively reweighted least squares

The current implementation for the Cox model uses it. It's not worth the effort yet to code up general classes of models where this IRLS+SGD idea would work—at least not...

‹
1
2
3
4
5
6
7
8
9
10
11
12
›