node-chaos-monkey
node-chaos-monkey copied to clipboard
Optimization + real CPU memory prank
Hey, I made some changes to the project, the project is still not production ready but this is a good start
Size optimization:
- Added to .gitignore .history and .vscode and removed them from the git repo
- removed unused dependency as
underscore(as a library, we want to be minimal size) - removed
pranks/big-text-file
package size:
| npm modules | before | after |
|---|---|---|
| with node_modules | 6.6MB | 3.07MB |
| without node_modules | 3.96MB | 512KB |
CPU Load
The CPU load didn't really test any CPU load, it had a big impact on the memory, and had a minor impact on the event loop, but it's not really CPU intensive test.
Basically we do want to have 3 different tests (each test should have minor as possible impact on the others)
- CPU load
- Memory load
- Event loop block
I changed the CPU test to be memory light and none blocking event loop (it uses fork, the child process is fully blocked) but now, it actually loads the CPU for 95-100%
The start is getting duration argument and now stop will actually stop the test
There's one thing I didn't change on the class which is this: (only added the comment)
constructor(expressApp) {
// @todo, not to do this
super(...arguments);
}
in my opinion, this is a bad practice. I didn't change that because I wasn't sure what args the constructor gets and also the super class isn't used at all
Anyway, let's take this project to good places!
Just noticed a bug I have
stop() {
this.forks.forEach((f) => {
f.kill("SIGKILL");
});
this.forks = [];
}
should also clear the existing timeout.
clearTimeout(this.stopTimeout)
@KromDaniel So glad you join the battle
Short context - this project was developed during a flight with a goal to quickly benchmark customer's deployment. Back then 'best practices' weren't a concern but based on feedback that I received in the last few days - it seems like a package that can appeal to many. The overall topics is trendy and the package can be communicated as a new type of testing, a very important type as it checks the overall app resiliency, and can be even automated to an extent so people get an immediate benchmark with result without even writing a line. Optionally with a sleek UI or even an assertion language expect(error.unhandled).to.not.crash.process... We can brainstorm it also by phone to feel whether we're passionate to take it to the next step
The overall PR + the remarks absolutely resonate with me, loved the observation of separating cpu vs memory vs event loop! There are more than a few other fixes we should perform (minor note: some crypto methods run on the libuv thread pool and don't block the loop), two questions:
- You deleted the .txt file but it's used in the memory prank as well, did you modify the memory prank? not a big issue, I can overload the memory with randomstring
- The cpu prank start is now not async and gets a duration param. This deviates from the original design (maybe for a good reason, trying to find out) which promote separation of the prank from the scheduler so the duration will be determined by the scheduler which will emit a stop event to the prank. The upside of the original design is that since the prank is not aware of duration/scheduling logic, one can mix any prank with any scheduling type. With that context in mind, do you see a reason to pass a duration param?
Yo @i0natan
some crypto methods run on the libuv thread pool and don't block the loop -> so we won't use it at the event-loop-block test notice I changed it to be createCipherIv instead of createCipher because it's deprecated, anyway, it does load some CPU
About the questions:
- I didn't modify the memory test, probably need to add unit tests later so stuff like that could flow up easily
- It's fine, I didn't go to deep about how does the scheduler work, I wrote it as independent module that accepts duration and performs high CPU usage for that duration, if there's a controller that decides about timing then this parameter should be optional
- Memory test - I'll perform the necessary adjustment, we don't need a text file in our project. yeah, 1 plain unit test per prank would have caught this
- CPU scheduling - there's no need to pass duration, you just specify a scheduler which emits start and stop event to the prank, so you code right now is fully compliant only we need to make the start method async + drop the duration param. You wanna take it or I do?
I'll do it, thanks
closing this PR and will open new one after update
@KromDaniel
Cool, just to emphasize: the scheduler object was really needed, it allows doing pranks in various timings fashion like:
- DelayedSchedule - do something once, but after the process is live for 10 sec
- Spikes - does our monitoring system when the CPU spikes every 30 seconds and not constantly (call start() for few sec, stop then sleep for 30 sec)
- Immediate - make something bad before even the app loads (will the daemon keep on restarting? will our monitoring sys catch that?)
@i0natan reopened :) I think we can hard merge the current package.json and ignore mine
@KromDaniel I'll merge this on Monday, working now on that weird Sunday event :]
Is there any plan to take this project to next level as its really cool. @goldbergyoni ??