cosmic-ray icon indicating copy to clipboard operation
cosmic-ray copied to clipboard

Can I reduce the memory consumed by cosmic-ray?

Open MartinThoma opened this issue 4 years ago • 12 comments

When I run cosmic-ray, I cannot use my computer. It consumes 15 GB of memory ... when it started filling the swap, I killed it.

Can I reduce the memory consumption of it?

MartinThoma avatar Oct 18 '19 12:10 MartinThoma

this usually happens when a mutation changes parameters to range() on python2 or causes creation of some other large list

workaround would be to either execute it under a user with limited access to memory or run without swap

tomato42 avatar Oct 19 '19 12:10 tomato42

I execute cosmic-ray 5.6.1 with Python 3.6.8

Interestingly, cosmic-ray also seems not to install the required dependencies. I tried the following cr-config.toml for my module mpu:

[cosmic-ray]
module-path = "mpu"
python-version = ""
timeout = 10
exclude-modules = []
test-command = "python3 -m pytest"
execution-engine.name = "local"

[cosmic-ray.cloning]
method = 'copy'
commands = ["pip3 install -e .[all]"]

I executed:

$ time cosmic-ray init cr-config.toml cr_session.sqlite
real	46,95s
user	9,81s
sys	1,95s

# I 'DELETE FROM work_items LIMIT 2170' to keep the execution time low
# This leaves 10 tests
$ time cosmic-ray exec cr_session.sqlite
2019-10-19 18:58:06,839 cosmic_ray.cloning ERROR Error running command in virtual environment
command: pip3 install -e .[all]
error: b"Obtaining file:///tmp/tmp21ptzj3k/repo\nCollecting pandas (from mpu==0.21.0)\n  Using cached https://files.pythonhosted.org/packages/86/12/08b092f6fc9e4c2552e37add0861d0e0e0d743f78f1318973caad970b3fc/pandas-0.25.2-cp36-cp36m-manylinux1_x86_64.whl\nCollecting python-magic (from mpu==0.21.0)\n  ERROR: Could not find a version that satisfies the requirement python-magic (from mpu==0.21.0) (from versions: none)\nERROR: No matching distribution found for python-magic (from mpu==0.21.0)\nWARNING: You are using pip version 19.2.2, however version 19.3.1 is available.\nYou should consider upgrading via the 'pip install --upgrade pip' command.\n"

real	12,08s
user	49,01s
sys	5,47s

Due to those issues, I moved to mutmut.

MartinThoma avatar Oct 19 '19 16:10 MartinThoma

Sorry I didn't get to this sooner...buried under work right now.

this usually happens when a mutation changes parameters to range() on python2 or causes creation of some other large list

Right, this is a known issue (though there's no open issue on it), and I'm not sure the best way to deal with it beyond having the user skip certain mutations. We could do things like look for mutations inside range calls or list constructors, but that wouldn't address transitive mutations that make their way into those calls.

cosmic-ray also seems not to install the required dependencies

CR is certainly trying to install the dependencies you asked for. Any idea why it's seeing this:

ERROR: Could not find a version that satisfies the requirement python-magic (from mpu==0.21.0)

Due to those issues, I moved to mutmut.

Fair enough. Can we close this then?

abingham avatar Oct 20 '19 08:10 abingham

Can we close this then?

I'd say that it's still a possible issue, so having some mechanism of handling it in cosmic-ray is probably a good idea.

I'm not aware of anything portable, but there are solutions for Unix systems in general (that use ulimit) and Linux specific (that use cgroups)

tomato42 avatar Oct 20 '19 09:10 tomato42

I'd say that it's still a possible issue

It's certainly still an issue, and I'm happy to keep it open. Things like ulimit and cgroups seems like extrinsic solutions to me (though I'm no expert), and maybe something that can already be used today without any change to CR. If so, maybe what's really needed is a discussion of these things in the documentation (e.g. a section on "strategies for avoiding resource overuse").

Intrinsic solutions seem much harder. I don't have any real insight into how we'd detect mutations that might cause memory explosions except in very simple cases. Any ideas?

abingham avatar Oct 20 '19 09:10 abingham

Intrinsic solutions seem much harder. I don't have any real insight into how we'd detect mutations that might cause memory explosions except in very simple cases

I wouldn't say that mutations like that should be avoided - but interpreting results of them is much more complex. In some cases it may indicate use of a wrong algorithm (like use or range() in py2 instead of xrange(), or list generator instead of iterator), and sometimes they can be false positives (like causing python to allocate 4GiB large byte string when the value processed will never be this large). So they are more like "things you may want to take a closer look at", rather than something we can feed into a formula for mutation score.

tomato42 avatar Oct 20 '19 09:10 tomato42

This points to an interesting idea. I wonder if we could create post-processors that look for common pathologies in test output, e.g. looking for massive memory uses and suggesting that the user look at it.

abingham avatar Oct 20 '19 09:10 abingham

I wonder if I can get a concrete example of a mutation that can cause this? Maybe I'm missing some important mutant in mutmut and that's the only reason mutmut survives! That would be bad.

boxed avatar Oct 20 '19 09:10 boxed

I don't have an example from "the real world", but imagine something like mutating this:

x = [0] * 50

to

x = [0] * 50000000000

This would use a billion times the memory. I don't know if CR has this specific behavior; I'm pretty sure we select number mutations that are generally close to the original value. But nothing in CR would stop someone from creating an operator that does exactly that.

abingham avatar Oct 20 '19 09:10 abingham

Something that could happen today is that mutation somehow prevents a loop from terminating, and this then results in unbounded memory consumption (i.e. because of the specifics of what's going on in that loop).

abingham avatar Oct 20 '19 09:10 abingham

yeah, something that turns

for _ in range(50 * 80):
    x += b'some string'

into

for _ in range(50 ** 80):
    x += b'some string'

will exhaust memory of any system

tomato42 avatar Oct 20 '19 10:10 tomato42

Ouch. Yea that's a great example. I would like to say I escape this by being smart but I believe I escape this because we've removed incorrect mutations of * to ** and removed too much!

boxed avatar Oct 20 '19 10:10 boxed