modules icon indicating copy to clipboard operation
modules copied to clipboard

[COOLER] new functions - balance / makebins

Open nservant opened this issue 2 years ago • 1 comments

Update of the COOLER modules

  • New balance tool
  • New makebins tool
  • I just realized that my PR also includes an old one that I did a few months ago to update the cload https://github.com/nf-core/modules/pull/1404

nservant avatar Jul 29 '22 09:07 nservant

Of note, I removed the md5sum from cload and balance because coolers store creation date in the object. So the md5sum are always changing !

cooler info 1A2_1A6_vehicle.mm10.mapq_30.100.cool
{
    "bin-size": 100,
    "bin-type": "fixed",
 >>>"creation-date": "2022-06-01T14:16:44.409574",
    "format": "HDF5::Cooler",
    "format-url": "https://github.com/open2c/cooler",
    "format-version": 3,
    "generated-by": "cooler-0.8.11",
    "genome-assembly": "mm10",
    "nbins": 27255386,
    "nchroms": 22,
    "nnz": 294182628,
    "storage-mode": "symmetric-upper",
    "sum": 344516030
}

nservant avatar Jul 29 '22 12:07 nservant

Hi @nservant . Let's first do #1404 ? That'll make this PR cleaner

muffato avatar Sep 28 '22 07:09 muffato

Sorry I missed @muffato comment above, then go ahead as you discussed and finish first the other PR 😅

JoseEspinosa avatar Sep 28 '22 08:09 JoseEspinosa

No worries, @JoseEspinosa , I should have added myself as a reviewer. I've got some questions / concerns about cload and dump, which I'll try to address in #1404 . Otherwise happy with the new functions

muffato avatar Sep 28 '22 09:09 muffato

Of note, I removed the md5sum from cload and balance because coolers store creation date in the object. So the md5sum are always changing !

@nservant note that pytest-workflow has other ways of verifying file contents aside from md5sum: https://pytest-workflow.readthedocs.io/en/stable/#test-options

So you could use contains to check for a specific string within the file if you want, which is probably quite a bit more reliable than just checking for the path 👍🏻

ewels avatar Oct 28 '22 09:10 ewels

Of note, I removed the md5sum from cload and balance because coolers store creation date in the object. So the md5sum are always changing !

@nservant note that pytest-workflow has other ways of verifying file contents aside from md5sum: https://pytest-workflow.readthedocs.io/en/stable/#test-options

So you could use contains to check for a specific string within the file if you want, which is probably quite a bit more reliable than just checking for the path 👍🏻

Hi @ewels . Can't use contains here because the .cool files are binary. Instead, the trick is to convert them to .bedpe with cooler/dump

muffato avatar Oct 30 '22 22:10 muffato

Thanks for all your comments. Anything else I can do ?

nservant avatar Oct 31 '22 08:10 nservant

Hi @nservant . If you can confirm you're happy with my further changes, that'd be great. I hope I didn't lose anything from your original intent !

muffato avatar Oct 31 '22 09:10 muffato

Yes, it looks all good to me. Could you merge ?

nservant avatar Oct 31 '22 12:10 nservant