Larger extract, hosting and definition plan
This task is to specify and create the larger dataset, that we will make available to end users for scaling out their data size.
This task is in progress. I'm 80% on loading the initial dataset (pre processsing) from Stack Overflow 2014-05 archive.
Loaded whole of stack overflow, first draft.
Sah-weet!
Created a dataset for use in getting PR pushed along.
The larger dataset needs to be created, but for EA-3 we still can stick to a .tgz distribution method and worry about a larger one after EA-3
Maybe I'm just unclear on how this is different than #46, at least in terms of timing seems like they would both be 8.0-1?
Kicking to you as PM issue, doesn't absolutely need solution for 8.0-1 at all, but good to consider