p2.js icon indicating copy to clipboard operation
p2.js copied to clipboard

CPU usage

Open tobireif opened this issue 9 years ago • 12 comments

When I open https://schteppe.github.io/p2.js/demos/circles.html in Firefox (on Mac OS 10.10 on an older MacBook), and stir the circles, it uses ~ 100% CPU. Same in Chrome.

Is there anything that can be done in P2 JS to reduce CPU usage?

tobireif avatar Jul 03 '15 15:07 tobireif

That demo is sort of a stress test - stacking and many contacts is bound to be slow...

You can do many things to reduce CPU load. For example, reduce solver iterations, turn off friction, use larger solver tolerance, or use a larger time step (you can tweak many of these parameters in the demos).

Or just make sure to not have too many contacts :)

schteppe avatar Jul 03 '15 15:07 schteppe

Thanks for your reply. Is there anything that can be done in the source of the P2 JS lib to reduce CPU usage?

stacking and many contacts is bound to be slow.

Perhaps it can be made faster? :) Some code-level optimizations perhaps ...

The Firefox profiler shows that most of the time is spent by the renderer functions.

I've been looking for ages for a physics lib (plus renderer) which does ~exactly the stuff in that demo while using eg under 50% CPU (on that machine). Not sure if it's possible ...

I'll consider the tips you listed for my lib-using code.

Please feel free to close the issue as there's no specific issue to fix (unless you want to keep it as a general reminder for potential perf improvements).

tobireif avatar Jul 03 '15 15:07 tobireif

There are always optimizations that can be done! :)

One thing that could help is simd.js, but I didn't have much luck with that yet. A demo that shows how a simple p2.js method can be accelerated is probably all I need to get started with that.

Another thing that I suspect can be an optimization is to inline the math methods that use other math methods.. But I need confirmation on that one.

Maybe a class based vector lib instead of Float32Array could speed up stuff too.

Oh another performance trick: reduce gravity. Low gravity allows for even larger time steps and fewer iterations. I call this the "Angry Birds" trick :)

schteppe avatar Jul 08 '15 05:07 schteppe

I've been using p2 inside a Web worker feeding a browser deltas. Reducing frame times is always no easy task

jtenner avatar Dec 05 '15 01:12 jtenner

Perhaps P2 could offer this as option?

tobireif avatar Dec 05 '15 11:12 tobireif

Optional web worker runner? Don't think so. Too much work. First problem is to figure out how to wrap the API, how to reference bodies/shapes/etc in the other thread, etc... Second problem is that workers are asynchronous, how to get around that? Third problem is that all applications are different, they don't need to sync everything between the main thread and the worker, so the API has to know what to sync. SIMD.js is much easier to implement and allows for more speedup than using a single worker solution.

schteppe avatar Dec 06 '15 09:12 schteppe

I hope SIMD will be implemented soon.

MDN https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SIMD -> "Browser compatibility" lists just FF Nightly, and http://caniuse.com/#search=simd doesn't list it at all.

It is listed at https://dev.windows.com/en-us/microsoft-edge/platform/status/simdes7 , but not at https://webkit.org/status/ https://platform-status.mozilla.org/ https://www.chromestatus.com/features

tobireif avatar Dec 06 '15 10:12 tobireif

@schteppe I agree with your sentiment. Web workers being async is tough. This shouldn't fall under your responsibility because your code runs perfectly inside a webworker already.

Putting p2 in a web worker

  1. Use webpack and worker-loader (or workerify and browserify)
  2. Ask physics and drawing to happen simultaneously by ticking the web worker
  3. stream deltas back so the browser receives the deltas AFTER drawing
  4. wait for the next frame and draw the previous frames data.

Example physics frame

I have a declarative drawing framework that creates render trees, so I generate the trees immediately after physics is done by streaming Float32Arrays one by one back to the browser. (note that they are usually groups of objects combined together)

For example, if I want to update a group of circles from a worker...

var result = [];
for(var i = 0; i < circles.length; i++) {
  var circle = circles[i];
  result.push(
    circle.position[0],
    circle.position[1],
    circle.angle
  );
}
var message = new Float32Array(result);
postMessage({ type: 'update', value: message }, [message.buffer]);

While the browser is working on changing the way the "circles" are updating, I send another message to the browser with other data about some other physics object.

This is pretty much the only way to make p2 work while simultaneously avoiding race conditions. It's very stable.

jtenner avatar Dec 07 '15 16:12 jtenner

I have a fully operational web worker integration in my "eventually-might-be-an-engine" thing, but it does have some flaws. Most importantly, the input lag. If I reduce physics iterations below 60, it is REALLY noticeable due to a very simple reason: first a message has to be sent to webworker with updated input, be it keyboard or mouse or whatever, then P2 update loop must execute, then webworker has to send data back to the front-end. There are plenty of miliseconds to lose, in the worst case entire P2 frame time. At 30 fps this is 33ms, which is a lot.

I still wonder if anything can be done about this.

Scharnvirk avatar Feb 12 '16 22:02 Scharnvirk

@Scharnvirk I have the exact same problem. I am going to assume you have not implemented the following things unless you have stated otherwise.

Here are the solutions you must implement:

  • Assert you are pushing back binary data inside the worker
var msg = new Float64Array(....);
postMessage({ type: 'update', data: msg }, [msg.buffer]); // transferable data is fastest
  • Push user inputs to the web worker immediately.
window.onmousemove = function(e) {
 // don't do mouse location calcuation here unless you have to getBoundingRect()
  worker.postMessage({ type: 'mouse', data: { e.clientX, e.clientY }});
};
  • Since each frame will be behind by at least 17ms, if you can do input correction, do it. My game does not require frame perfect inputs because I do input correction by advancing bullets and controllable stuff by a frame before I add it to the world.
  • Do not wait to return the result of the webworker, send it back synchronously inside the frame and let the browser check for an updated frame to draw the world. This is because with a web worker, we want to operate parallel with the worker while it's drawing. If we do this correctly, then we can simulate a much larger amount of objects.

My personal testing has concluded with nearly 500 bodies (circles) and a plane I can get 60 fps doing a world.tick(1/60) every frame no problem with nearly 3ms to spare (10ms browser and 10ms worker simultaneously). I don't know what p2 is like normally without workers, but it was good enough for me.

I don't know how good you are at reading javascript libraries, but this is the example game framework I used: https://github.com/jtenner/part-shop/tree/master/example

This is how I managed to get the web worker working myself.

jtenner avatar Feb 12 '16 23:02 jtenner

I do send data as binary format (Float32Array in my case), but I reuse the same array and don't use transferrable. I'll check if it makes a difference. I wouldn't be surprised if there was none; one approach requires creating array every frame and other one - cloning it, also every frame.

Second - good idea! I apparently overlooked that and indeed I do wait for renderer frame too, so this will for sure help. I need to do some calculation to map mouse position to world position, but this should not be too heavy.

How do you implement third? Idea is right and I was thinking of it, but, uh, how? Something with the integrate() function on a body?

And I don't really understand fourth. Won't I get the same positions if I send data to worker and immediately return? After all, worker is iterating asynchronously at some intervals, and each object is updated once per interval (am I right?). So if this is a little timeline:

start-----W-----W-----W-----W-----W---->

where W is a point where "my object" is getting its position calculated, then I might have a case like this:

start-----W-----W-R-S-W-----W-----W---->

where R is receive data from renderer, and S - immediate resend (after some calculations). Position calculated at W won't update in this case. Unless I am wrong :)

As for webworker, well, I just coded it by looking at the specification and LOTS of experimentation, profiler-debugging and guessing WTF is happening here and there. I am right now at 1400 circle/particle bodies at 60 fps, but I need MUCH more. Perhaps it is just not possible. On the other hand, archaic engine I've been using 10 years ago was easily handling collisions between 10k objects... at much more than 100 fps. But this was C++, entire lightyear in front of JS when it comes to performance.

Naturally that 1400 is at my machine, so no idea how it would work for you.

You might check it though :) http://epsilondynamics-wingmod.rhcloud.com/

Scharnvirk avatar Feb 13 '16 00:02 Scharnvirk

How do you implement third? Idea is right and I was thinking of it, but, uh, how? Something with the integrate() function on a body?

I don't know exactly. This is out of the scope of my understanding of your project. I simply advance the animations created by user input a frame. I also use object PRIOR positions to create objects.

I do send data as binary format (Float32Array in my case), but I reuse the same array and don't use transferrable. I'll check if it makes a difference. I wouldn't be surprised if there was none; one approach requires creating array every frame and other one - cloning it, also every frame.

I have a game framework that requires the developer create a Float32Array every time the data needs to be updated. The class definitions are loaded on both the worker AND the browser, so two objects run parallel with each other. See the below class definition for the example.

class GamePart extends Part {
  shouldComponentUpdate() {
    return true; //only if the data changed
  }
  onSerialize() { //if shouldUpdate then serialize data
    return new Float32Array(this.data);
  }
  onRender(data) { //serialized data from worker thread enters here
    //I have a declarative view layer called e2d.js and that allows me to update the view
    //in a declarative way instead of in an imperative way.

    return e2d.fillStyle(colors[data[0]]
       e2d.translate(data[1], data[2],
          e2d.fillArc(data[3])
       )
    );
  }
}

Trust me when I say it makes a huge difference.

Aside: I wanted my frameworks to come together to look like a react component that draws to a canvas. A declarative immutable view layer is highly underrated and too easily shunned.

And I don't really understand fourth. Won't I get the same positions if I send data to worker and immediately return? After all, worker is iterating asynchronously at some intervals, and each object is updated once per interval (am I right?).

You're right about the worker being async. What I meant was you need to code to assume it's possible to have race conditions, so don't wait to send back data. (you probably already do this)

Do the following things every requestAnimationFrame inside your browser:

  1. Immediately tell the worker to start the next frame. (this is the most important part)
  2. Draw the current state of your data browser side. (this is of course subject to race conditions if the worker is too slow)

Inside your worker:

  • Take user input and convert it to something meaningful.
  • For each part of your application, stream back multiple data updates instead of pushing a single big one.
for(var i = 0; i < this.parts.length; i++) {
   postMessage({ type: 'update-data', data: this.part.update(), index: i });
}

Then handle the message event separately and do whatever work needs to be done to format the data to make it drawable. That's it.

Aside: Also, my framework is not done yet. I'm not sure where I want it to go or how it should work, if it should even be a thing at all. Custom worker solutions are hard to implement and have way too much of an opinion on how things should run.

If you think you might have something to add, please email me at [email protected] because I could use all the help I can get.

jtenner avatar Feb 15 '16 15:02 jtenner