spyrmsd icon indicating copy to clipboard operation
spyrmsd copied to clipboard

RMSD with multicore and timeout functionality

Open Jnelen opened this issue 11 months ago • 40 comments

As we discussed in #105: here is my first shot at implementation of RMSD functions with multicore and timeout functionalities.

For now, I kept the rmsd_timeout function public, however we could also just make it private. This function supports a molref with multiple mols, something that the multicore function doesn't support directly (everything is done pairwise here). Although I guess there is an easy workaround for the user? Maybe it's even better to implement this ourselves: if only one mol is supplied as the ref mol, extend the list to match the length of the mol list? (like this [refmol]*len(mols)). This would also give each molecule a seperate timeout. Having the possibility of a mollist in the rmsd_timeout function also made me return the raw result of rmsdwrapper (a list) instead of the value (because otherwise only one value would be returned). So if we make it private, it could clean up things and also emulate the regular rmsdwrapper functionality better! (I just came up with this while writing this description, if you like it I will implement it tomorrow!)

I also still need to add tests. I guess we could add a test for a timeout, a test for matching mols and molrefs list, ... But multicore might be harder to test?

Any feedback welcome as always!

Note: I intentionally kept all the new funtions and imports together at the bottom so it's more clear to see what's added. This makes Flake8 doesn't pass. MyPy also doesn't pass yet, but I'll get to fixing that tomorrow as well!

Jnelen avatar Mar 11 '24 21:03 Jnelen

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.10%. Comparing base (da6be7f) to head (10a47be). Report is 14 commits behind head on develop.

Additional details and impacted files

codecov[bot] avatar Mar 11 '24 21:03 codecov[bot]

@Jnelen sorry, I clicked "Update Branch" without too much thinking. You'll have to pull before pushing further changes, otherwise you will get an error. Sorry about that!

RMeli avatar Mar 11 '24 22:03 RMeli

For the Value ctype I picked a regular, single precision float. However we could also use a double precision float if you think this is more appropriate!

Jnelen avatar Mar 13 '24 22:03 Jnelen

Thanks! Please don't waste too much time on mypy issues. It's supposed to be helpful, not annoying. If it becomes too much, we can simply turn it off for the offending line(s) and deal with it another time.

I just suppressed the 2 remaining MyPY errors. One of them should hopefully get fixed by MyPy (see the earlier issue I referred to), the other is the problem that os.cpu_count() can return None sometimes, which would give problems when using the min(num_workers, os.cpu_count()). I tried your patch, but it was still complaining.

Functionally I think it's basically how we want to have, but perhaps some parts of the implementation can still be cleaned up? Besides that, I suppose some tests should be added as well? The timeout feature should be relatively easy to test (by using one of the "problematic" compounds that made me start looking into this in the first place. But i'm not sure how to properly test the multicore feature?

By the way, there is no rush in getting this finished quickly, so please only review it if you actually have the time!

Jnelen avatar Mar 14 '24 20:03 Jnelen

Besides that, I suppose some tests should be added as well?

Yes, we need tests. =)

The timeout feature should be relatively easy to test (by using one of the "problematic" compounds that made me start looking into this in the first place).

Yes. I think the worst one you reported is a great candidate.

But i'm not sure how to properly test the multicore feature?

I think GitHub Actions have access to multiple cores, but I should double check. In any case, the test would be on correctness.

Eventually we could also add a benchmark test, but it's not a priority IMO. If you have some results from your work (where you see an improvement), you can just post the timings here for future reference. But not a priority, the important thing is that it's correct.

By the way, there is no rush in getting this finished quickly, so please only review it if you actually have the time!

Thanks. You are doing great work, so I don't want it to stall. ;)

RMeli avatar Mar 14 '24 20:03 RMeli

I added some very basic tests for the prmsdwrapper function. (I basically copied the tests with the rmsdwrapper function and used the prsmdwrapper function instead). I also added a test for the timeout function. However one thing I noticed is that for some reason this runs twice: once with a True and once with a False variable. When the tests were giving errors I got this:

FAILED tests/test_rmsd.py::test_prmsdwrapper_single_molecule_timeout[True] - assert nan is nan
FAILED tests/test_rmsd.py::test_prmsdwrapper_single_molecule_timeout[False] - assert nan is nan

I fixed the error of the assert, but the test still runs twice. I'm still confused since there isn't a parametrize, or even an input variable. I couldn't immediately figure it out, but I assume it's getting this from some earlier statement?

Ofcourse there is still some work left improving the actual prmsdwrapper function, but I suppose some basic tests can also add sanity checks to ensure we're not breaking stuff.

Jnelen avatar Mar 17 '24 22:03 Jnelen

Thanks @Jnelen! I'm a bit swamped this week too, but I'll try to have a look and reply ASAP. Sorry for the delays.

I'm still confused since there isn't a parametrize, or even an input variable. I couldn't immediately figure it out, but I assume it's getting this from some earlier statement?

Yes, this is a bit confusing. I think the two runs are coming from the following fixture, which is automatically applied to all the tests of the file (autouse=True).

https://github.com/RMeli/spyrmsd/blob/08764e1c3a9eb4b636c14ddf083e1cbbe50014f9/tests/test_rmsd.py#L11-L15

It's used to trigger a separate code path deep inside the minimized RMSD calculation. So the two runs are normal and you should not worry about them, but great you look this closely at what's happening!

RMeli avatar Mar 18 '24 07:03 RMeli

Thanks @Jnelen! I'm a bit swamped this week too, but I'll try to have a look and reply ASAP. Sorry for the delays.

No problem at all. Actually, I will probably also be quite inactive until Wednesday next week. I should still be able to respond to some comments, but I probably won't be able to change or test any new code until then. Just as a heads-up!

I'm still confused since there isn't a parametrize, or even an input variable. I couldn't immediately figure it out, but I assume it's getting this from some earlier statement?

Yes, this is a bit confusing. I think the two runs are coming from the following fixture, which is automatically applied to all the tests of the file (autouse=True).

https://github.com/RMeli/spyrmsd/blob/08764e1c3a9eb4b636c14ddf083e1cbbe50014f9/tests/test_rmsd.py#L11-L15

It's used to trigger a separate code path deep inside the minimized RMSD calculation. So the two runs are normal and you should not worry about them, but great you look this closely at what's happening!

Ah that explains it, thanks for the clarification!

Jnelen avatar Mar 18 '24 10:03 Jnelen

I'm back from my break! Right before that I actually updated my use-case (ESSENCE-Dock) to work with a time-out and multicore, but I ended up using Pebble since this handles both at the same time. For my specific use-case I think it was the most elegant solution, and I use a Singularity image anyway, so I didn't mind adding an additional dependency.

However this got me thinking if we should change the implementation of the current multicore and timeout implementation. Maybe I had my (specific) use-case too much in mind? It definitely is functional, and will work just fine to process a large number of molecules across different CPUs. However the calculation of graph-isomorphisms isn't currently accelerated, but for that I assume we need to look into specific solutions for the different backends (since I assume they can take care of the multicore support)? However, I could also see how this could lead to problems with the current multicore implementation, since it will want to use multiple cores in each process? That's why I'm wondering if we should make larger changes to the current version.

Finally as for a benchmark, I haven't ran a proper one yet, but I'll try to run one later with the current implementation.

Jnelen avatar Mar 28 '24 12:03 Jnelen

Hi,

I finally got around to work on a simple benchmark, but I have started to notice quite a notable difference between running the calculations the first time versus subsequent times. Here is a concrete example for the same dataset. Running it the first time:

real    0m30.536s
user    0m17.145s
sys     0m2.811s

Versus the second (and subsequent) time(s):

real    0m11.677s
user    0m11.957s
sys     0m2.657s

Do you know why this is? I'm not sure, but I think this is because of the loading of the molecules, and that in subsequent runs these are maybe cached? Also note the discrepancy between the real time and the user time in the first run. I tried several different datasets, and this pattern seems to be very common. Do you know what could be happening?

I will try to follow-up later with some concrete "benchmarks" and more context!

Jnelen avatar Mar 30 '24 17:03 Jnelen

Alright, I did some testing, and the performance of the parallel implementation is a lot worse than I thought it would be to be honest.. In the edge-cases tests I was doing it seemed good, because even if some "problematic" molecules were present it would still finish in a reasonable amount of time, where as with the "regular" implementation it would get killed or take a very long time. However, in more "normal" scenarios, the current prmsdwrapper seems to degrade the performance by a lot. I used some docking examples from DUD-E to test. This example is for HS90A, where I used the LeadFinder (LF) and Gnina (GN) results to calculate the RMSD for 4938 compounds. You can find the scripts I used here: benchmark_scripts.zip.

Anyway, here are my numbers:

  • Regular (rmsdwrapper)
real    0m11.990s
user    0m12.207s
sys     0m2.473s
  • Parallel (1 Core)
real    2m56.928s
user    0m25.358s
sys     2m39.194s
  • Parallel (4 Cores)
real    0m51.843s
user    0m24.107s
sys     2m45.873s
  • Parallel (8 Cores)
real    0m29.346s
user    0m23.753s
sys     2m40.650s
  • Parallel (16 Cores)
real    0m18.365s
user    0m21.712s
sys     2m37.021s

Something else I noticed is that error handling in the prmsdwrapper should be improved as well. To me it seems like in the current implementation it seems to almost never really be worth it to use prmsdwrapper, except when there are many edge-cases that can take a very long time or even kill the python process. However at this point it might be better for the user to implement specific changes/checks? I think our concept of the prmsdwrapper is still a good idea, but my implementation seems to be performing very suboptimal.. I would love to have a clean and efficient prmsdwrapper function merged, but this definitely isn't it. I also have many other projects coming up, so unfortunately I think that I will have less time to allocate for this as well. What do you think?

Jnelen avatar Mar 30 '24 19:03 Jnelen

Hi @Jnelen

but I ended up using Pebble since this handles both at the same time. For my specific use-case I think it was the most elegant solution, and I use a Singularity image anyway, so I didn't mind adding an additional dependency.

I ended up looking into pebble too during my investigation, since it seems to do what we need. I wouldn't mind adding an additional (optional!) dependency if you thing it's the most elegant.

However the calculation of graph-isomorphisms isn't currently accelerated, but for that I assume we need to look into specific solutions for the different backends (since I assume they can take care of the multicore support)?

I think your use case is rather common. Acceleration of graph isomorphisms is up to the backend, but as you mentioned previously graph-tool did not seem to use multiple processes for it. And I looked into cugraph, and it's also missing there, so I'm not sure where a parallel implementation is currently available. In any case, I think your use case is rather common (virtual screening) that warrants parallelism over molecules.

I finally got around to work on a simple benchmark, but I have started to notice quite a notable difference between running the calculations the first time versus subsequent times.

I have no idea why this happens on the top of my head, it depends how you run the benchmark. Maybe it's something related to __pycache__?

However, in more "normal" scenarios, the current prmsdwrapper seems to degrade the performance by a lot.

Spawning threads/processes has some overhead. I'm wondering if the fact that we do this (plus the creation of the shared memory result) every time _rmsd_timeout is called adds an unacceptable overhead. However, I looked at your benchmarks and if you are measuring the total time of the script I think you are also accounting for the I/O and preparation of the lists. I think a better comparison would be to time only prmsdwrapper and the loop over molecules for rmsdwrapper (or maybe just accumulate the time that rmsdwrapper takes, without the loop).

I would love to have a clean and efficient prmsdwrapper function merged, but this definitely isn't it. I also have many other projects coming up, so unfortunately I think that I will have less time to allocate for this as well. What do you think?

Maybe leaving out the timeout function for now (which I know is needed) would allow to have a more efficient prmsdwrapper? I'm not sure though, as usual in these cases it needs to be implemented and benchmarked. I totally understand if you have no time to work on this. This PR added a lot of pieces to the puzzle anyways, so it has been extremely useful to get started even if you decide to pause it (in which case we can convert it to a draft), or simply close it for now. Up to you. I also have little time to work on this at the moment, I'm sorry about that.

In the meantime I'll try to make a release soon without this PR, so your other feature #107 is available to everyone.

RMeli avatar Apr 04 '24 06:04 RMeli

Hi @Jnelen

but I ended up using Pebble since this handles both at the same time. For my specific use-case I think it was the most elegant solution, and I use a Singularity image anyway, so I didn't mind adding an additional dependency.

I ended up looking into pebble too during my investigation, since it seems to do what we need. I wouldn't mind adding an additional (optional!) dependency if you thing it's the most elegant.

However the calculation of graph-isomorphisms isn't currently accelerated, but for that I assume we need to look into specific solutions for the different backends (since I assume they can take care of the multicore support)?

I think your use case is rather common. Acceleration of graph isomorphisms is up to the backend, but as you mentioned previously graph-tool did not seem to use multiple processes for it. And I looked into cugraph, and it's also missing there, so I'm not sure where a parallel implementation is currently available. In any case, I think your use case is rather common (virtual screening) that warrants parallelism over molecules.

I finally got around to work on a simple benchmark, but I have started to notice quite a notable difference between running the calculations the first time versus subsequent times.

I have no idea why this happens on the top of my head, it depends how you run the benchmark. Maybe it's something related to __pycache__?

However, in more "normal" scenarios, the current prmsdwrapper seems to degrade the performance by a lot.

Spawning threads/processes has some overhead. I'm wondering if the fact that we do this (plus the creation of the shared memory result) every time _rmsd_timeout is called adds an unacceptable overhead. However, I looked at your benchmarks and if you are measuring the total time of the script I think you are also accounting for the I/O and preparation of the lists. I think a better comparison would be to time only prmsdwrapper and the loop over molecules for rmsdwrapper (or maybe just accumulate the time that rmsdwrapper takes, without the loop).

I would love to have a clean and efficient prmsdwrapper function merged, but this definitely isn't it. I also have many other projects coming up, so unfortunately I think that I will have less time to allocate for this as well. What do you think?

Maybe leaving out the timeout function for now (which I know is needed) would allow to have a more efficient prmsdwrapper? I'm not sure though, as usual in these cases it needs to be implemented and benchmarked. I totally understand if you have no time to work on this. This PR added a lot of pieces to the puzzle anyways, so it has been extremely useful to get started even if you decide to pause it (in which case we can convert it to a draft), or simply close it for now. Up to you. I also have little time to work on this at the moment, I'm sorry about that.

In the meantime I'll try to make a release soon without this PR, so your other feature #107 is available to everyone.

Hi @RMeli

What I will try to do when I find some time is to implement the prmsdwrapper but with pebble, and try to benchmark that as well. If the performance is (a lot) better we could try to move forward with it and get it implemented (which would require the optional pebble dependency). However if the performancestill isn't good, I would suggest to pause this idea for now and turn it into a draft, just like you suggested. What do you think?

Jnelen avatar Apr 04 '24 09:04 Jnelen

@Jnelen sounds good to me, thanks for all the work!

RMeli avatar Apr 04 '24 09:04 RMeli

I finally got around to integrate pebble into a multicore function. I kept the "old" version for now, but if you agree, I will delete those, and just keep the pebble implementation. I currentely named it pebblermsdwrapper, but I will rename it to prmsdwrapper after removing the original implementation. (if you agree) The pebble version is a LOT faster, and also just a lot simpler. To benchmark it, I used the GCR library from DUD-E, which contains around 15k compounds. This time I only time the RMSD calculations themselves, and exclude the loading of the molecules (io.loadmol)

  • regular rmsdwrapper function: 12.00s
  • pebblermsdwrapper with num_worker=1: 14.27s
  • pebblermsdwrapper with num_worker=2: 7.69s
  • pebblermsdwrapper with num_worker=4: 4.93s

So when using one core, there is of course some overhead, but I find this more than reasonable in this case. Also this has the timeout feature, so it can be used for to handle "edgecases" as well. Finally, there is also a "chunksize" parameter, which means multiple molecules are processed per process. Currently, I have kept this in, but not sure if this should be present in the final version. It can give a speedup, but if it times out or gives an error, all the molecules from that chunk/batch that still needed to be processed will fail. The standard probably should be 1, but I guess giving atleast the option to configure it is good? Here are the same numbers with a chunksize of 10 for reference:

  • pebblermsdwrapper with num_worker=1, chunksize=10: 12.87s
  • pebblermsdwrapper with num_worker=2, chunksize=10: 6.70s
  • pebblermsdwrapper with num_worker=4, chunksize=10: 3.73s

Overall, I think using pebble for this is the way forward. Pebble itself also doesn't require any dependencies, and is really light weight. I would love to hear your feedback/thoughts. If you agree, I will take a second look at the code and make some cleanups .

Jnelen avatar Apr 07 '24 13:04 Jnelen

The pebble version is a LOT faster, and also just a lot simpler.

Do I need to read further? =)

This time I only time the RMSD calculations themselves, and exclude the loading of the molecules (io.loadmol).

Thanks! Would you mind sharing those benchmarks as well, just for completeness?

So when using one core, there is of course some overhead, but I find this more than reasonable in this case. Also this has the timeout feature, so it can be used for to handle "edgecases" as well.

Yes, a bit of overhead is expected and I too think it looks reasonable in this case. And if the timeout feature comes "for free", that's even better.

The standard probably should be 1, but I guess giving at least the option to configure it is good?

Yes, a reasonable default with the option to configure it for power user is great. I'd totally keep the "chunk" .

Overall, I think using pebble for this is the way forward.

I agree, thanks for looking into it! The project has 500+ stars on GitHub and it's currently under development, so I think it's definitely a reasonable inclusion (especially in light of your investigation!).

I'll have a proper look at the code later or in the coming days, but everything sounds great! Thanks a lot for the investigation and thorough report. The only thing I'd say is to keep pebble as an optional dependency anyways, since it's not critical for the functionality of spyrmsd, but a great addition. Given this, I'd maybe move everything related to this into a spyrmsd.parallel module, so that one would do

from spyrmsd.parallel import prmsdwrapper

The optional dependency can be done at the module level with something like the following:

try:
  import pebble
except:
  errmsg = ("Parallel execution of SPyRMSD (`prmsdwrapper`) requires `pebble`."
            "Please install `pebble`: https://github.com/noxdafox/pebble")
  raise ImportError(errmsg)

RMeli avatar Apr 07 '24 19:04 RMeli

PS: Thanks to your PR on backend selection I'm simplifying CI and realised the test/molecules.py is really bad (it lead to bad design choices), and problematic when used with @pytest.parametrize and multiple backends. Depending on which PR gets in first, one or the other will need some tweaking of the tests. If this is too annoying for you, I can hold off the other PR until this is done (although CI would be much faster on this PR with the other one in). Just let me know.

RMeli avatar Apr 07 '24 19:04 RMeli

Cycling CI for https://github.com/RMeli/spyrmsd/issues/119

RMeli avatar Apr 08 '24 06:04 RMeli

This time I only time the RMSD calculations themselves, and exclude the loading of the molecules (io.loadmol).

Thanks! Would you mind sharing those benchmarks as well, just for completeness?

Oh yes of course! The dataset I used can be found here: https://zenodo.org/records/10257101/files/DUDE_GCR_3bqd.zip?download=1, where I used the Gnina (VS_GN_...) and LeadFinder (VS_LF_...) results.
The scripts I used can be found here: benchmark_scripts.zip

I'll have a proper look at the code later or in the coming days, but everything sounds great! Thanks a lot for the investigation and thorough report. The only thing I'd say is to keep pebble as an optional dependency anyways, since it's not critical for the functionality of spyrmsd, but a great addition. Given this, I'd maybe move everything related to this into a spyrmsd.parallel module, so that one would do

from spyrmsd.parallel import prmsdwrapper

The optional dependency can be done at the module level with something like the following:

try:
  import pebble
except:
  errmsg = ("Parallel execution of SPyRMSD (`prmsdwrapper`) requires `pebble`."
            "Please install `pebble`: https://github.com/noxdafox/pebble")
  raise ImportError(errmsg)

Oh that sounds like a good idea. I'll try to implement it like that later today. I guess that with that we can even leave rmsd.py completely untouched in this PR!

As for reviewing, only do it when you have the time, there is no rush!

PS: Thanks to your PR on backend selection I'm simplifying CI and realised the test/molecules.py is really bad (it lead to bad design choices), and problematic when used with @pytest.parametrize and multiple backends. Depending on which PR gets in first, one or the other will need some tweaking of the tests. If this is too annoying for you, I can hold off the other PR until this is done (although CI would be much faster on this PR with the other one in). Just let me know.

Really glad to hear that! You can go ahead, we can fix up this later. Especially since with the pebble implementation this code is really lightweight anyway!

Jnelen avatar Apr 08 '24 16:04 Jnelen

I'll try to implement it like that later today. I guess that with that we can even leave rmsd.py completely untouched in this PR!

Sounds great. The tests can then be moved to test_parallel.py so that it's everything is orthogonal, which is a great design IMO.

I'll do a proper review when you do the move and remove the old code, but from a first look it looks good.

You can go ahead, we can fix up this later.

Ok, thanks. I think I'll merge soon, then leave a few comments here to the steps needed to make this PR work with the refactoring (it should be only a couple of things after all).

RMeli avatar Apr 08 '24 19:04 RMeli

I just pushed the commit where I reverted rmsd.py and test_rmsd.py. I reverted to previous commits of these files, so it should be fine. I added a separate parallel.py and test_parallel.py. One thing I'm not too sure about is the "reporting"/printing with prmsdwrapper, especially when the chunks are larger than 1. Also I'm not sure if there should be more tests? Also, I suppose I need to merge now with the latest main branch to make the new changes for the tests?

Jnelen avatar Apr 08 '24 20:04 Jnelen

I added a separate parallel.py and test_parallel.py.

Perfect, thanks!

One thing I'm not too sure about is the "reporting"/printing with prmsdwrapper, especially when the chunks are larger than 1.

I've seen you comment in the code. I'm not sure either, I will think a bit more about it. However, I would use warnings or errors instead of print().

Also I'm not sure if there should be more tests?

Let's see what the coverage says. If everything is covered, there should be no need. Or maybe it could be added to the long tests, but not sure it's worth it.

Also, I suppose I need to merge now with the latest main branch to make the new changes for the tests?

Yes, with develop, not master (which is only for releases). I left above a few comments on your tests for the changes needed to have them working with #118. Since your PR is orthogonal, it should be reasonably straightforward (I hope).

RMeli avatar Apr 08 '24 20:04 RMeli

I wanted to merge the latest develop branch with this branch via Github, but it's complaining about conflicting files (tests/molecules.py). However I can't resolve the conflict because I don't have write access. Can you do it? After the lasted develop branch is merged, I will try to implement the changes from your latest review (probably tomorrow/later this week)!

Jnelen avatar Apr 08 '24 21:04 Jnelen

@Jnelen usually you should be able to do it, since it would be modifying your own branch/fork and not the main repo. It turns out I can't do it either from the web interface, and I think it's because the tests/molecules.py file has been deleted. You need to do it locally using git's CLI.

RMeli avatar Apr 09 '24 07:04 RMeli

@Jnelen usually you should be able to do it, since it would be modifying your own branch/fork and not the main repo. It turns out I can't do it either from the web interface, and I think it's because the tests/molecules.py file has been deleted. You need to do it locally using git's CLI.

Yeah I just realized that now as well. I will try to do it this evening!

Jnelen avatar Apr 09 '24 08:04 Jnelen

I merged the latest branch and updated the tests. Honestly I took the updated tests from tests_rmsd which use rmsdwrapper, and made them use prmsdwrapper instead. I also added an importskip in case Pebble isn't installed. However, maybe it'd be a good idea to include the installation of Pebble in the CI? If you agree I can add it to the yamls in the conda-envs (spyrmsd-test-....yaml)?

Jnelen avatar Apr 09 '24 21:04 Jnelen

Thanks @Jnelen! I'll review the main function ASAP, sorry for the delay.

However, maybe it'd be a good idea to include the installation of Pebble in the CI?

Yes, it needs to be tested in CI. Since it's an optional dependency we need CI configurations both with and without. But since your design is completely orthogonal, I don't think we need to add too many. I'd suggest to use pebble in the RDKit configurations and not use it with OpenBabel configurations, so that we don't expand the CI matrix. What do you think? The files to add it to would be spyrmsd-test-rdkit-all.yaml and spyrmsd-test-rdkit-nogt.yaml.

RMeli avatar Apr 10 '24 07:04 RMeli

Thanks @Jnelen! I'll review the main function ASAP, sorry for the delay.

There is no rush at all! Please only do it when you have the time!

Yes, it needs to be tested in CI. Since it's an optional dependency we need CI configurations both with and without. But since your design is completely orthogonal, I don't think we need to add too many. I'd suggest to use pebble in the RDKit configurations and not use it with OpenBabel configurations, so that we don't expand the CI matrix. What do you think? The files to add it to would be spyrmsd-test-rdkit-all.yaml and spyrmsd-test-rdkit-nogt.yaml.

Perfect, I just added it!

Jnelen avatar Apr 10 '24 09:04 Jnelen

I just pushed an update where I tried to address most of your comments. I think the main thing still is fixing the (potential) issues with chunksize >1 and adding more tests for prmsdwrapper (especially regarding chunksize..). I also opened an issue in the pebble repository to see if he can give some input (there was no discussion page), since it's quite confusing. In the absolute worst case scenario where we can't make it work properly we can restrict the chunksize I suppose, and make it flexible once we find a (proper) fix

Jnelen avatar Apr 11 '24 17:04 Jnelen

Alright, I made some minor changes (I added a check as you suggested), and also changed the printing to a more compact warning. I'm thinking maybe add a test where the chunksize is tested (with and without a timeout)? What other concrete changes should be made in your opinion for our redefined scope?

Jnelen avatar Apr 14 '24 19:04 Jnelen