Matti Lyra

Results 11 issues of Matti Lyra

Neither one of these added metrics has been added to the mini batch kmeans at this point.

Stalled
module:cluster
cython

Allow a stream to be fed in and deduplicated in parallel. Obivously the deduplication itself can not happen in parallel but shingling and minhashing the documents can. Given a fast...

at the moment all the documents are stored in an in-memory database - it should be possible to define this to be anything that supports getting/setting items

lsh should be `pip` installable, use `cookiecutter`

SimHash is another LSH technique for near duplicate detection, it relies on cosine similarity instead of Jaccard similarity. https://en.wikipedia.org/wiki/SimHash https://doi.org/10.1145/509907.509965

When adding a new `dask_ec2.Instance` to an existing `dask_ec2.Cluster ` the username and keypair parameters are not copied to the instance, which consequently causes the `ssh_client` of the `Instance` to...

I really like creating slide presentation using Jupyter Notebooks, but the workflow is currently fairly cumbersome. I often find myself wanting to control HTML `` tag parameters like - slide...

I've been piecing together an auto-scaling `dask` cluster on AWS, using `adaptive` and bits from `dask_ec2`. It would be really useful to know what the semantics of the `scale_up` and...

documentation
adaptive

There is an issue with how the scheduler assigns tasks from the `unrannable` queue to workers who meet the resource requirements joining the scheduler. The use case is some long...

adaptive

The subprocesses running under ShellBolt ShellSpout should have information about the topology context they run in, specifically their component ID and the sources and targets. I think there is already...