Dustin Tran

Results 118 comments of Dustin Tran

For some idea of what the visuals look like, I quite like the stuff in http://arxiv.org/abs/1206.1106 for example (not particularly, just an arbitrary paper I chose). i.e. high resolution fonts,...

picture of current progress ![screen shot 2015-05-31 at 11 18 23 am](https://cloud.githubusercontent.com/assets/2569867/7902404/cd740b5c-0786-11e5-9478-7e3a9b3e48ff.png) bugs/things to continue working on: - sgd gives nonsensical prediction results (could be a result of bad learning...

Progress: ![screen shot 2015-05-31 at 5 03 19 pm](https://cloud.githubusercontent.com/assets/2569867/7903853/08540080-07b7-11e5-8934-b3d71063d650.png) 1. was definitely just a problem of setting the hyperparameter `alpha` in the Xu's learning rate. This also still needs to...

I generalized the `d`-dimensional learning rate `D_n` to have hyperparameters `α` and `c`: ``` I_hat = α*I_hat + diag(I_hat_new) D_n = 1/(I_hat)^c ``` The observed Fisher information `I_hat` is the...

Yup. ``` R library(sgd) # Dimensions N

I've been trying to dig into the theory and am thoroughly perplexed. The paper looks at minimizing the regret function using the Mahalanobis norm, which generalizes L2. That is, we...

Yup, would definitely be interesting to see. That is, we check the variance of the two estimates as `n -> infty` through a plot

As a reminder (to self), this was looked at and briefly mentioned in the current draft for the NIPS submission. The intuition behind why AdaGrad leads to better empirical performance...

You can look at the method of moments example in the repo. It implements a gradient function which is passed into SGD. This can be useful for simple prototyping, bu...

The current implementation for the Cox model uses it. It's not worth the effort yet to code up general classes of models where this IRLS+SGD idea would work—at least not...