snakemake
snakemake copied to clipboard
`mem_mb` and `mem_mib` do not match `mem` when `mem` is specified with unit.
Snakemake version
Version 8.9.0
Describe the bug
When mem
is specified with a unit, the mem_mb
and mem_mib
resources do not make sense. For example, when the request is mem="2G"
, mem_mb
is 1907 and mem_mib
is 954.
If 2G
is supposed to be gigabyes, then mem_mb
should be 2000 and mem_mib
should be 1907. If 2G
is supposed to be gibibytes, then mem_mb
should be 2147 and mem_mib
should be 2048.
Logs
[Fri Mar 22 14:02:58 2024]
rule test:
output: test.txt
jobid: 1
reason: Missing output files: test.txt
threads: 6
resources: mem_mb=1907, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, lsf_project=acc_LOAD, lsf_queue=premium, runtime=2880, mem=2G
Minimal example
Here is a test rule:
rule test:
output: "test.txt"
threads: 6
resources:
runtime="2d",
mem="2G"
shell: ".echo test > {output}"
Additional context
This may be partially related to #2695
I am using the lsf executor plugin, and I am trying to troubleshoot if this is partially due to something I did in the plugin (mem_mib
is not shown when not using the cluster and mem_mb
is what mem_mib
should be if 2G
is gigabytes), but I don't remember doing anything that could change the printed resources.
I just checked my code in snakemake-executor-plugin-lsf and I am not updating the resources, just accessing them. I am basically doing the same thing as the Slurm executor.
I am bumping this issue since it is still present and is rather severe.
I can't reproduce this locally (tested with 8.14.0 and 7.32.4). I guess it is either related to the executor plugin or to the job submission mechanism. Might be able to test with SLURM later today.
(I had to remove the .
in front of the echo
in your minimal example. Guess this is a typo?)
(I had to remove the
.
in front of theecho
in your minimal example. Guess this is a typo?)
yes, sorry
Getting the same in Snakemake 8.15.2. A slurm job that is given 10G reports these resources:
resources: mem_mb=9537, mem_mib=1031
The first number makes sense assuming that mem: 10G
is interpreted as 10^10, which would then result in 10^10 / 1024 / 1024 = 9536.7. The second number does not make much sense though, but seems to be the one being checked, and results in this log:
Job needs mem_mib=1037 but only mem_mib=1031 are available. This is likely because two jobs are connected via a pipe or a service output and have to run simultaneously. Consider providing more resources (e.g. via --cores).
The job in question is not connected via a pipe with other jobs.
I'm getting the same issue on Snakemake 8.16 with the slurm executor. I couldn't understand why it didn't have enough memory for a job. Seems to affect jobs that are in a job group. Something is wrong with the way mem_mib
is calculated from mem_mb
because mem_mib
is wildly off and much less, causing this issue when running with job groups.
I am bumping this issue since it is still present and is rather severe.
It still exists as of 8.16 and I believe the reason some couldn't reproduce the error is because I'm finding it only pays attention to the incorrect mem_mib
when you are submitting jobs as part of a job group. When you don't use job grouping the mem_mib
is still incorrect compared to mem_mb
but then SLURM and Snakemake don't pay attention to mem_mib
and allocate resources based only on mem_mb
Just throwing out an observation, it seems like mem_mib
always gets set to 954 no matter what mem_mb
is. 954 is the default setting I think so it's not getting overridden by the user setting.
For others coming here, the workaround is to not use human readable mem
values in your config, if you specify mem
as megabyte numbers then mem_mib
gets set properly
I don't know why this is so difficult to fix. I'll take a look at the relevant code.
It's probably here:
https://github.com/snakemake/snakemake/blob/main/snakemake/resources.py#L632-L644
Or it could be an upstream bug in humanfriendly
.
I don't know why this is so difficult to fix. I'll take a look at the relevant code.
It seems like when human readable mem
is specified that the code forgets to set mem_mib
and it stays the default values of 1G == 954 mib
OK, small update:
The memory parsing seems to work technically correctly, but in an unexpected manner. humanfriendly.parse_size()
has an option to always assume binary that should probably be enabled. Otherwise, 2G or 2GB
will be translated to 2e9
rather than 2 * 2^30
(or 2 * 2**30
in python).
The issue is that Snakemake then converts the bytes value to mem_mb
with the following code: resources[inferred_name] = max(int(round(in_bytes / 1024 / 1024)), 1)
when it should be resources[inferred_name] = max(int(round(in_bytes / 1e6)), 1)
mem_mib should be calculated in the following manner:
resources[inferred_name_mib] = max(int(round(in_bytes / 2**20)), 1)
Pull request time?
@hermidalc, would you be willing to write a unit test if I fix the code?
Oh, and Snakemake should probably assume everything is binary rather than SI, because memory is always physically binary and programs tend to think in binary, but it would still be technically correct if G
and GB
were read as SI units.
OK, small update:
The memory parsing seems to work technically correctly, but in an unexpected manner.
humanfriendly.parse_size()
has an option to always assume binary that should probably be enabled. Otherwise,2G or 2GB
will be translated to2e9
rather than2 * 2^30
(or2 * 2**30
in python).The issue is that Snakemake then converts the bytes value to
mem_mb
with the following code:resources[inferred_name] = max(int(round(in_bytes / 1024 / 1024)), 1)
when it should beresources[inferred_name] = max(int(round(in_bytes / 1e6)), 1)
mem_mib should be calculated in the following manner:
resources[inferred_name_mib] = max(int(round(in_bytes / 2**20)), 1)
Pull request time?
Good catch yep I also noticed that even mem_mb
was not the right size a little lower than the human readable size specified.
BTW, mem_mib gets set here: https://github.com/snakemake/snakemake/blob/9a60046461970efac229190dc5fddd6dbb86746b/snakemake/rules.py#L1044, so it's after mem_mb is set on L1031 of the same file.
@hermidalc, would you be willing to write a unit test if I fix the code?
Yes I can write it, I will use https://github.com/snakemake/snakemake/tree/main/tests/test_inferred_resources as a template. Bear with me I don't have a fork of snakemake to do testing and pull requests
Thanks, @hermidalc. The following is also wrong:
https://github.com/snakemake/snakemake/blob/9a60046461970efac229190dc5fddd6dbb86746b/snakemake/common/init.py#L60
It should actually read as the following:
def mb_to_mib(mb):
return int(math.ceil(mb / 0.95367431640625))
or
def mb_to_mib(mb):
return int(math.ceil(mb * 1.048576))
in_bytes = parse_size(value)
or in_bytes = parse_size(binary=True)
?
here is my branch:
https://github.com/BEFH/snakemake/tree/mem_disk_fix
You could maybe do a PR to that or I can add you as a contributor, @hermidalc
here is my branch:
https://github.com/BEFH/snakemake/tree/mem_disk_fix
You could maybe do a PR to that or I can add you as a contributor, @hermidalc
I forked your repo writing the test(s) now. Hey did this effect disk_mib
in addition to mem_mib
?
here is my branch: https://github.com/BEFH/snakemake/tree/mem_disk_fix You could maybe do a PR to that or I can add you as a contributor, @hermidalc
I forked your repo writing the test(s) now. Hey did this effect
disk_mib
in addition tomem_mib
?
And sorry additional question does your fixed code round mem_mib
to nearest mebibyte integer or always round up to nearest mebibyte integer? E.g. 5000 MB == 4768.372 MiB does the fixed code give 4768 or 4769 here?
Hey did this effect
disk_mib
in addition tomem_mib
?
Everything that effects mem_mb or mem_mib effects disk_mb
or disk_mib
. The same functions are at play.
Thanks. Please make sure you work on the mem_disk_fix branch in your fork.
I think I have fixed everything including the update of the mib
values, but I have not run any tests.
I switched round
to math.ceil
, so your example should be 4769. But once again, I have not tested. The changes are in 3 (!!!) different files, unfortunately.
Hey did this effect
disk_mib
in addition tomem_mib
?Everything that effects mem_mb or mem_mib effects
disk_mb
ordisk_mib
. The same functions are at play.Thanks. Please make sure you work on the mem_disk_fix branch in your fork.
I think I have fixed everything including the update of the
mib
values, but I have not run any tests.I switched
round
tomath.ceil
, so your example should be 4769. But once again, I have not tested. The changes are in 3 (!!!) different files, unfortunately.
Ok I made a unit test and then a pull request onto that branch. Please tell me if I should add any additional unit tests for code coverage
Thanks @hermidalc, we'll see if the PR is merged into snakemake/snakemake:main