biothings.api icon indicating copy to clipboard operation
biothings.api copied to clipboard

fix utils.diff.diff_collections helper function to run diffs in parallel

Open newgene opened this issue 5 months ago • 0 comments

utils.diff.diff_collections helper function is not used directly in the hub, but still a useful tool to test two data collections for their diffs.

https://github.com/biothings/biothings.api/blob/e635db03a0b5930f2436ede8ae2ee3316ac75e58/biothings/utils/diff.py#L114

The existing use_parallel option was using ipython parallel, which is probably no longer working. We would like to have a new way to run diffs in parallel, without the dependency of ipython parallel. Typically, we don't need to parallelize across multiple machines, parallelizing on multiple CPU cores of the same machine should be good enough.

newgene avatar Jan 18 '24 23:01 newgene