snakemake icon indicating copy to clipboard operation
snakemake copied to clipboard

`mem_mb` and `mem_mib` do not match `mem` when `mem` is specified with unit.

Open BEFH opened this issue 11 months ago • 4 comments

Snakemake version

Version 8.9.0

Describe the bug

When mem is specified with a unit, the mem_mb and mem_mib resources do not make sense. For example, when the request is mem="2G", mem_mb is 1907 and mem_mib is 954.

If 2G is supposed to be gigabyes, then mem_mb should be 2000 and mem_mib should be 1907. If 2G is supposed to be gibibytes, then mem_mb should be 2147 and mem_mib should be 2048.

Logs

[Fri Mar 22 14:02:58 2024]
rule test:
    output: test.txt
    jobid: 1
    reason: Missing output files: test.txt
    threads: 6
    resources: mem_mb=1907, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, lsf_project=acc_LOAD, lsf_queue=premium, runtime=2880, mem=2G

Minimal example

Here is a test rule:

rule test:
  output: "test.txt"
  threads: 6
  resources:
    runtime="2d",
    mem="2G"
  shell: ".echo test > {output}"

Additional context

This may be partially related to #2695 I am using the lsf executor plugin, and I am trying to troubleshoot if this is partially due to something I did in the plugin (mem_mib is not shown when not using the cluster and mem_mb is what mem_mib should be if 2G is gigabytes), but I don't remember doing anything that could change the printed resources.

BEFH avatar Mar 22 '24 18:03 BEFH

I just checked my code in snakemake-executor-plugin-lsf and I am not updating the resources, just accessing them. I am basically doing the same thing as the Slurm executor.

BEFH avatar Mar 22 '24 18:03 BEFH

I am bumping this issue since it is still present and is rather severe.

BEFH avatar May 17 '24 12:05 BEFH

I can't reproduce this locally (tested with 8.14.0 and 7.32.4). I guess it is either related to the executor plugin or to the job submission mechanism. Might be able to test with SLURM later today.

(I had to remove the . in front of the echo in your minimal example. Guess this is a typo?)

lumbric avatar Jun 13 '24 09:06 lumbric

(I had to remove the . in front of the echo in your minimal example. Guess this is a typo?)

yes, sorry

BEFH avatar Jun 13 '24 19:06 BEFH

Getting the same in Snakemake 8.15.2. A slurm job that is given 10G reports these resources:

resources: mem_mb=9537, mem_mib=1031

The first number makes sense assuming that mem: 10G is interpreted as 10^10, which would then result in 10^10 / 1024 / 1024 = 9536.7. The second number does not make much sense though, but seems to be the one being checked, and results in this log:

Job needs mem_mib=1037 but only mem_mib=1031 are available. This is likely because two jobs are connected via a pipe or a service output and have to run simultaneously. Consider providing more resources (e.g. via --cores).

The job in question is not connected via a pipe with other jobs.

lczech avatar Jul 23 '24 00:07 lczech

I'm getting the same issue on Snakemake 8.16 with the slurm executor. I couldn't understand why it didn't have enough memory for a job. Seems to affect jobs that are in a job group. Something is wrong with the way mem_mib is calculated from mem_mb because mem_mib is wildly off and much less, causing this issue when running with job groups.

hermidalc avatar Aug 12 '24 23:08 hermidalc

I am bumping this issue since it is still present and is rather severe.

It still exists as of 8.16 and I believe the reason some couldn't reproduce the error is because I'm finding it only pays attention to the incorrect mem_mib when you are submitting jobs as part of a job group. When you don't use job grouping the mem_mib is still incorrect compared to mem_mb but then SLURM and Snakemake don't pay attention to mem_mib and allocate resources based only on mem_mb

hermidalc avatar Aug 14 '24 14:08 hermidalc

Just throwing out an observation, it seems like mem_mib always gets set to 954 no matter what mem_mb is. 954 is the default setting I think so it's not getting overridden by the user setting.

hermidalc avatar Aug 14 '24 22:08 hermidalc

For others coming here, the workaround is to not use human readable mem values in your config, if you specify mem as megabyte numbers then mem_mib gets set properly

hermidalc avatar Aug 22 '24 12:08 hermidalc

I don't know why this is so difficult to fix. I'll take a look at the relevant code.

It's probably here:

https://github.com/snakemake/snakemake/blob/main/snakemake/resources.py#L632-L644

Or it could be an upstream bug in humanfriendly.

BEFH avatar Aug 22 '24 12:08 BEFH

I don't know why this is so difficult to fix. I'll take a look at the relevant code.

It seems like when human readable mem is specified that the code forgets to set mem_mib and it stays the default values of 1G == 954 mib

hermidalc avatar Aug 22 '24 12:08 hermidalc

OK, small update:

The memory parsing seems to work technically correctly, but in an unexpected manner. humanfriendly.parse_size() has an option to always assume binary that should probably be enabled. Otherwise, 2G or 2GB will be translated to 2e9 rather than 2 * 2^30 (or 2 * 2**30 in python).

The issue is that Snakemake then converts the bytes value to mem_mb with the following code: resources[inferred_name] = max(int(round(in_bytes / 1024 / 1024)), 1) when it should be resources[inferred_name] = max(int(round(in_bytes / 1e6)), 1)

mem_mib should be calculated in the following manner: resources[inferred_name_mib] = max(int(round(in_bytes / 2**20)), 1)

Pull request time?

@hermidalc, would you be willing to write a unit test if I fix the code?

BEFH avatar Aug 22 '24 12:08 BEFH

Oh, and Snakemake should probably assume everything is binary rather than SI, because memory is always physically binary and programs tend to think in binary, but it would still be technically correct if G and GB were read as SI units.

BEFH avatar Aug 22 '24 12:08 BEFH

OK, small update:

The memory parsing seems to work technically correctly, but in an unexpected manner. humanfriendly.parse_size() has an option to always assume binary that should probably be enabled. Otherwise, 2G or 2GB will be translated to 2e9 rather than 2 * 2^30 (or 2 * 2**30 in python).

The issue is that Snakemake then converts the bytes value to mem_mb with the following code: resources[inferred_name] = max(int(round(in_bytes / 1024 / 1024)), 1) when it should be resources[inferred_name] = max(int(round(in_bytes / 1e6)), 1)

mem_mib should be calculated in the following manner: resources[inferred_name_mib] = max(int(round(in_bytes / 2**20)), 1)

Pull request time?

Good catch yep I also noticed that even mem_mb was not the right size a little lower than the human readable size specified.

hermidalc avatar Aug 22 '24 13:08 hermidalc

BTW, mem_mib gets set here: https://github.com/snakemake/snakemake/blob/9a60046461970efac229190dc5fddd6dbb86746b/snakemake/rules.py#L1044, so it's after mem_mb is set on L1031 of the same file.

BEFH avatar Aug 22 '24 13:08 BEFH

@hermidalc, would you be willing to write a unit test if I fix the code?

Yes I can write it, I will use https://github.com/snakemake/snakemake/tree/main/tests/test_inferred_resources as a template. Bear with me I don't have a fork of snakemake to do testing and pull requests

hermidalc avatar Aug 22 '24 13:08 hermidalc

Thanks, @hermidalc. The following is also wrong:

https://github.com/snakemake/snakemake/blob/9a60046461970efac229190dc5fddd6dbb86746b/snakemake/common/init.py#L60

It should actually read as the following:

def mb_to_mib(mb):
    return int(math.ceil(mb / 0.95367431640625))

or

def mb_to_mib(mb):
    return int(math.ceil(mb * 1.048576))

BEFH avatar Aug 22 '24 13:08 BEFH

in_bytes = parse_size(value) or in_bytes = parse_size(binary=True)?

BEFH avatar Aug 22 '24 13:08 BEFH

here is my branch:

https://github.com/BEFH/snakemake/tree/mem_disk_fix

You could maybe do a PR to that or I can add you as a contributor, @hermidalc

BEFH avatar Aug 22 '24 13:08 BEFH

here is my branch:

https://github.com/BEFH/snakemake/tree/mem_disk_fix

You could maybe do a PR to that or I can add you as a contributor, @hermidalc

I forked your repo writing the test(s) now. Hey did this effect disk_mib in addition to mem_mib?

hermidalc avatar Aug 22 '24 13:08 hermidalc

here is my branch: https://github.com/BEFH/snakemake/tree/mem_disk_fix You could maybe do a PR to that or I can add you as a contributor, @hermidalc

I forked your repo writing the test(s) now. Hey did this effect disk_mib in addition to mem_mib?

And sorry additional question does your fixed code round mem_mib to nearest mebibyte integer or always round up to nearest mebibyte integer? E.g. 5000 MB == 4768.372 MiB does the fixed code give 4768 or 4769 here?

hermidalc avatar Aug 22 '24 14:08 hermidalc

Hey did this effect disk_mib in addition to mem_mib?

Everything that effects mem_mb or mem_mib effects disk_mb or disk_mib. The same functions are at play.

Thanks. Please make sure you work on the mem_disk_fix branch in your fork.

I think I have fixed everything including the update of the mib values, but I have not run any tests.

I switched round to math.ceil, so your example should be 4769. But once again, I have not tested. The changes are in 3 (!!!) different files, unfortunately.

BEFH avatar Aug 22 '24 14:08 BEFH

Hey did this effect disk_mib in addition to mem_mib?

Everything that effects mem_mb or mem_mib effects disk_mb or disk_mib. The same functions are at play.

Thanks. Please make sure you work on the mem_disk_fix branch in your fork.

I think I have fixed everything including the update of the mib values, but I have not run any tests.

I switched round to math.ceil, so your example should be 4769. But once again, I have not tested. The changes are in 3 (!!!) different files, unfortunately.

Ok I made a unit test and then a pull request onto that branch. Please tell me if I should add any additional unit tests for code coverage

hermidalc avatar Aug 22 '24 14:08 hermidalc

Thanks @hermidalc, we'll see if the PR is merged into snakemake/snakemake:main

BEFH avatar Aug 22 '24 19:08 BEFH