Matthew Rocklin
Matthew Rocklin
The tornado event loop does add some overhead. My guess would be tens of microseconds per operation. Yes, it is probably possible to make everything run in the main user...
If you want single-process execution but are comfortable with threads, then you can have this now with Client(processes=False) # in master Some things like functions *do* get serialized, but everything...
SKLearn hyper-parameter searches would be a nice candidate. @jcrist may be able to provide a couple of examples. For general problems you would probably want to support categories and integers.
The mutable mappings in zict may be useful to construct something here. This is what the dask distributed scheduler uses. On Apr 21, 2017 14:14, "Erik Welch" wrote: > Currently...
I was hoping for interactivity. This would require a fair amount of the logic to be clientside. I'm not sure what lies on the client and what lies on the...
One vis on a page covers my immediate need. Obviously it'd be nice to have more but I'm pretty content with the solution so far. Bokeh has an interesting solution...
Just to throw another hat in the ring, Dask's distributed scheduler might be an easy way to both distribute work and to coordinate parameters. http://distributed.readthedocs.org
This one may have been partially on my end. The datetime string was formatted in a way that python AWS libraries (boto) were able to cope with, but wasn't exactly...
I haven't created an index. I guess that's the problem? For good performance on S3 at this size we should plan to have an index?
OK, after running the following: ```python tbl.create_index(num_partitions=256, num_sub_vectors=96) ``` This runs in 4-5s ```python results = tbl.search( model.encode("...") ).limit(3).to_pandas().text.tolist() ``` (model encoding runs in ~100ms)