Compute C-order memmap, projections, and correlation image in parallel, efficiently, avoiding OOM
Extends #311 (311 should be merged before this one).
This combines a few changes I've made to functions that process the whole movie (other than motion correction and CNMF), to avoid running out of memory. For example, save_memmap in some cases loads the whole movie with caiman.load(... in_memory=True), which for some file types does try to load the whole movie into memory. This PR makes it so that re-saving the memmap and computing projections and correlations always operate in parallel over chunks of pixels small enough to fit into memory (taking the number of processes into account).
One other change is that for motion correction, the correlation image is now computed over the whole movie, rather than in 1000-frame windows which are then averaged. The cnmf function already does it this way, but it would sometimes run out of memory. With pixel chunking (using square chunks with overlap to avoid internal edge effects), the memory issue is fixed and tests confirmed the correlation result is the same for cnmf. There is a change in the mcorr result, to match the cnmf correlation image to <1e-6 difference (within the margin for current tests), so I just replaced the mcorr ground truth with a copy of the cnmf ground truth and updated on a new zenodo version.
Edit to add: It also skips re-saving the mmap in CNMF and CNMFE if the input file is already a C-order mmap.