Chris White
Chris White
I haven't looked into whether we could use this data for benchmarking, but the incredibly large dataset over at https://www.kaggle.com/c/outbrain-click-prediction/data seems like it could be a good candidate. We might...
This is the insight that randomized algorithms like stochastic gradient descent take advantage of; so, in theory you could compute the gradient on a randomly selected chunk and use that...
@hussainsultan yea, randomization is best -- sorted is probably not a good idea (example: imagine one of the local updates tries to fit a logistic regression on data that has...
The only other GLM family I know of that is used "in the wild" is the Poisson family which should be very straightforward to include. (@mpancia do you know of...
I actually really like this idea but I *do* think a refactor which preserves the runtimes will be more difficult than it initially seems; for example, @mpancia I'd encourage you...
Naive attempt at using this to add an intercept makes `admm` choke: ```python X = da.random.random((100, 2), chunks=(50,2)) y = make_y(X, beta=np.array([-1.0, 2]), chunks=(50,)) o = da.ones((X.shape[0], 1), chunks=(X.chunks[0], (1,)))...
Interesting work; however, they are mainly focused on the situation with a "huge" number of features. I think in the GLM space it is uncommon to use more than, say,...
No surveys that I know of unfortunately, but here's a list off the top of my head: - I've heard people say you can use the [close connection between LDA...
This pattern of result persistence is updated and fixed in 3.0 - I'm going to close this but if there are other issues that arise, please open a new issue.
Thanks for the mention @TomAugspurger! I'd be happy to go over the relevant differences between Server / Cloud and help you get set up either way - just let me...