node-chaos-monkey icon indicating copy to clipboard operation
node-chaos-monkey copied to clipboard

Optimization + real CPU memory prank

Open KromDaniel opened this issue 7 years ago • 9 comments

Hey, I made some changes to the project, the project is still not production ready but this is a good start

Size optimization:

  1. Added to .gitignore .history and .vscode and removed them from the git repo
  2. removed unused dependency as underscore (as a library, we want to be minimal size)
  3. removed pranks/big-text-file

package size:

npm modules before after
with node_modules 6.6MB 3.07MB
without node_modules 3.96MB 512KB

CPU Load

The CPU load didn't really test any CPU load, it had a big impact on the memory, and had a minor impact on the event loop, but it's not really CPU intensive test.

Basically we do want to have 3 different tests (each test should have minor as possible impact on the others)

  1. CPU load
  2. Memory load
  3. Event loop block

I changed the CPU test to be memory light and none blocking event loop (it uses fork, the child process is fully blocked) but now, it actually loads the CPU for 95-100%

The start is getting duration argument and now stop will actually stop the test

There's one thing I didn't change on the class which is this: (only added the comment)

constructor(expressApp) {
	// @todo, not to do this
	super(...arguments);
}

in my opinion, this is a bad practice. I didn't change that because I wasn't sure what args the constructor gets and also the super class isn't used at all

Anyway, let's take this project to good places!


Just noticed a bug I have

	stop() {
		this.forks.forEach((f) => {
			f.kill("SIGKILL");
		});
		this.forks = [];
	}

should also clear the existing timeout. clearTimeout(this.stopTimeout)

KromDaniel avatar Jul 06 '18 08:07 KromDaniel

@KromDaniel So glad you join the battle

Short context - this project was developed during a flight with a goal to quickly benchmark customer's deployment. Back then 'best practices' weren't a concern but based on feedback that I received in the last few days - it seems like a package that can appeal to many. The overall topics is trendy and the package can be communicated as a new type of testing, a very important type as it checks the overall app resiliency, and can be even automated to an extent so people get an immediate benchmark with result without even writing a line. Optionally with a sleek UI or even an assertion language expect(error.unhandled).to.not.crash.process... We can brainstorm it also by phone to feel whether we're passionate to take it to the next step

The overall PR + the remarks absolutely resonate with me, loved the observation of separating cpu vs memory vs event loop! There are more than a few other fixes we should perform (minor note: some crypto methods run on the libuv thread pool and don't block the loop), two questions:

  • You deleted the .txt file but it's used in the memory prank as well, did you modify the memory prank? not a big issue, I can overload the memory with randomstring
  • The cpu prank start is now not async and gets a duration param. This deviates from the original design (maybe for a good reason, trying to find out) which promote separation of the prank from the scheduler so the duration will be determined by the scheduler which will emit a stop event to the prank. The upside of the original design is that since the prank is not aware of duration/scheduling logic, one can mix any prank with any scheduling type. With that context in mind, do you see a reason to pass a duration param?

goldbergyoni avatar Jul 08 '18 07:07 goldbergyoni

Yo @i0natan

some crypto methods run on the libuv thread pool and don't block the loop -> so we won't use it at the event-loop-block test notice I changed it to be createCipherIv instead of createCipher because it's deprecated, anyway, it does load some CPU

About the questions:

  • I didn't modify the memory test, probably need to add unit tests later so stuff like that could flow up easily
  • It's fine, I didn't go to deep about how does the scheduler work, I wrote it as independent module that accepts duration and performs high CPU usage for that duration, if there's a controller that decides about timing then this parameter should be optional

KromDaniel avatar Jul 08 '18 07:07 KromDaniel

  • Memory test - I'll perform the necessary adjustment, we don't need a text file in our project. yeah, 1 plain unit test per prank would have caught this
  • CPU scheduling - there's no need to pass duration, you just specify a scheduler which emits start and stop event to the prank, so you code right now is fully compliant only we need to make the start method async + drop the duration param. You wanna take it or I do?

goldbergyoni avatar Jul 08 '18 07:07 goldbergyoni

I'll do it, thanks

KromDaniel avatar Jul 08 '18 12:07 KromDaniel

closing this PR and will open new one after update

KromDaniel avatar Jul 08 '18 12:07 KromDaniel

@KromDaniel

Cool, just to emphasize: the scheduler object was really needed, it allows doing pranks in various timings fashion like:

  • DelayedSchedule - do something once, but after the process is live for 10 sec
  • Spikes - does our monitoring system when the CPU spikes every 30 seconds and not constantly (call start() for few sec, stop then sleep for 30 sec)
  • Immediate - make something bad before even the app loads (will the daemon keep on restarting? will our monitoring sys catch that?)

goldbergyoni avatar Jul 08 '18 12:07 goldbergyoni

@i0natan reopened :) I think we can hard merge the current package.json and ignore mine

KromDaniel avatar Oct 18 '18 11:10 KromDaniel

@KromDaniel I'll merge this on Monday, working now on that weird Sunday event :]

goldbergyoni avatar Oct 26 '18 13:10 goldbergyoni

Is there any plan to take this project to next level as its really cool. @goldbergyoni ??

bhupesh-sf avatar Jul 21 '21 05:07 bhupesh-sf