Matthew Rocklin comments

Results 1063 comments of


                                            Matthew Rocklin

trafficstars

Serial execution

The tornado event loop does add some overhead. My guess would be tens of microseconds per operation. Yes, it is probably possible to make everything run in the main user...

If you want single-process execution but are comfortable with threads, then you can have this now with Client(processes=False) # in master Some things like functions *do* get serialized, but everything...

Benchmark functions to experiment on

SKLearn hyper-parameter searches would be a nice candidate. @jcrist may be able to provide a couple of examples. For general problems you would probably want to support categories and integers.

Better memory management of results cache

The mutable mappings in zict may be useful to construct something here. This is what the dask distributed scheduler uses. On Apr 21, 2017 14:14, "Erik Welch" wrote: > Currently...

Embed diagram in static HTML

I was hoping for interactivity. This would require a fair amount of the logic to be clientside. I'm not sure what lies on the client and what lies on the...

Embed diagram in static HTML

One vis on a page covers my immediate need. Obviously it'd be nice to have more but I'm pretty content with the solution so far. Bokeh has an interesting solution...

simple multiprocess enhancement for speedup

Just to throw another hat in the ring, Dask's distributed scheduler might be an easy way to both distribute work and to coordinate parameters. http://distributed.readthedocs.org

bug(python): S3 Permissions

This one may have been partially on my end. The datetime string was formatted in a way that python AWS libraries (boto) were able to cope with, but wasn't exactly...

bug(python): Slow Performance on S3-backed storage

I haven't created an index. I guess that's the problem? For good performance on S3 at this size we should plan to have an index?

bug(python): Slow Performance on S3-backed storage

OK, after running the following: ```python tbl.create_index(num_partitions=256, num_sub_vectors=96) ``` This runs in 4-5s ```python results = tbl.search( model.encode("...") ).limit(3).to_pandas().text.tolist() ``` (model encoding runs in ~100ms)