ode4j icon indicating copy to clipboard operation
ode4j copied to clipboard

Garbage collection - DContact

Open Namek opened this issue 8 years ago • 14 comments

Hello, I have looked over your port of ODE and I can see that you don't prevent from garbage collection.

In documentation I can read: ode4j is in certain cases faster that ODE. I have investigated this only briefly, but I have the impression that Java can draw its advantage from using the garbage collector for arrays. This means de-allocation in Java occurs in a 2nd thread in parallel to computations, while in C/C++ de-allocation (which is expensive) occurs in the main thread.

what platforms have you considered? I think you weren't looking at Android or Web (after compilation to JavaScript using GWT, that's what libgdx does). Anyhow, allocation (of array of primities) itself is in fact fast but it's not the case for deallocation. Garbage collection takes CPU cycles and it's definetely not costless, especially when some more memory is allocated AFTER your array to be deallocated. 2nd thread does not help much since it causes memory fragmentation.

But enough about array of primitives. I see a problem about instantiating DContact too much.

private DNearCallback onCollisionCallback = new DNearCallback() {
    final int N = 100;
    DContactBuffer contacts = new DContactBuffer(N);

    @Override
    public void call(Object data, DGeom o1, DGeom o2) {
        int n = OdeHelper.collide(o1, o2, N, contacts.getGeomBuffer());
        for (int i = 0; i < n; ++i) {
            DContact contact = contacts.get(i);
            DJoint c = OdeHelper.createContactJoint(physics, contactGroup, contact);
            c.attach(contact.geom.g1.getBody(), contact.geom.g2.getBody());
        }
    }
}

I wanted to create contacts once but it's buggy, seems I can't reuse those DContact objects, so I have to allocate contacts every time a potential collision occurs (of course, N number would be smaller).Objects in DContactGeomBuffer could be reused but the class is final so I can't do much without modifying library.

I'd like to introduce possibility of reusing DContact/DContactGeom objects. Are those stored somewhere or just used for step simulation and forgotten?

Namek avatar Oct 04 '15 12:10 Namek

There are several point discussed here:

  1. About the GC in 2nd thread:
    • Platform was Linux 64bit on Intel desktop.
    • By 2nd thread I mean that in C++ allocation and deallocation happens in the same thread, while in Java the deallocation may be done by a separate thread (e.g. in case of concurrent-GC). That means the main thread would be faster as it does not have to perform the deallocation.
    • In hindsight I think this analysis was wrong, I probably didn't even use the concurrent GC. I now believe it was due to optimisations, such as loop-unrolling, which Java can do if loop counters are runtime-constants. C++ however usually needs compile-time constants, which may not available (for example if the main Matrix is always only 3 rows long).
  2. I totally agree that the DContact(Buffer/GeomBuffer) stuff is not very nice. I don't think contacts are used internally, so I suppose the best would be a reset() method on the buffer. The method would we called from inside OdeHelper.collide(...) and would reset the buffer. The individual contacts could then be reset as well, or they could be reset later, only when they are actually overwritten.
  3. You mention a solution that is prevented by have a final class, what would the solution be?
  4. You mention that there is a problem about instantiating DContacts too much. I see how that happens, but do you have some measurements that this is really a problem? I agree it should be fixed, but it would be nice to have some kind of (performance-) indicator that things really get faster with a new implementation. After all, resetting objects doesn't come for free either.

tzaeschke avatar Oct 05 '15 13:10 tzaeschke

There are a few mutable classes used in ode4j anyway so making DContact and related stuff mutable will probably make no additional harm, but I am just wondering how many contacts you may need to create pe second? Millions? If not then GC should not be too busy, is so then most likely GC is not going to be your biggest concern as the constraint solver will be taking most of the CPU's time. I would also suggest benchmarking with G1 enabled as it is going to be the default GC in Java9.

ppiastucki avatar Oct 05 '15 13:10 ppiastucki

@piotr-piastucki seems not everyone is targetting Java 9.

also:

I am just wondering how many contacts you may need to create pe second? Millions? If not then GC should not be too busy, is so then most likely GC is not going to be your biggest concern as the constraint solver will be taking most of the CPU's time.

One frame could create 100 joints, another just 4. It depends. How do I predict how big DContactBuffer should I create? I can't. So I make it to be as big as biggest probably possible, every frame. Imagine buffer of size 100 DContacts. Just by looking at Task Manager (Windows user here) memory usage grows about few megs per second for the process. How can not that be a cost worth mentioning? Please note, I target both PCs and mobiles (although no commercial purposes), the former is pretty performant but the latter isn't.

@tzaeschke I have created some solution like reset(), I'll post it few hours later when I'll get back to my PC. About measurements, Android are always a problem when we're talking about GC. There is Dalvik (recently ART?), no original JVM. The problem about automatic deallocation is that it's not performed when we want it. If you see a game on Android which has a short freeze from time to time - expect that to be a GC's work.

Namek avatar Oct 05 '15 14:10 Namek

@Namek If ode4j is growing 5MB per second, then this looks indeed like a problem. I think the point that piotr is trying to make is whether the GC problems come from DContacts or maybe something else?

Also, and please correct me if I'm wrong, I seem to remember that many people limit their contacts to a small number (say 10) because that may often be enough for a realistic simulation. Do you need all 100 contacts? See for example issue #22.

tzaeschke avatar Oct 05 '15 14:10 tzaeschke

@Namek 100 contact is a lot. Usually you do not need all the contacts, just the most important ones (sorted by depth = force). In many cases 5 is just enough and you very rarely need to go beyond 10. As mentioned by Tilmann - see https://bitbucket.org/odedevs/ode/issues/36/fix-gimpact-contacts-handling (BTW, it has been fixed in ODE) or https://github.com/tzaeschke/ode4j/issues/22 for trimesh related contact sorting. Please also try to use a tool like VisualVM to ensure the number of DContact instances is really responsible for the excessive GC overhead.

@tzaeschke Just a small optimization related to GC I have just noticed :) The array of geoms is cloned in BoxPruning method in the DXSAPSpace class as follows: ArrayList<DxGeom> buffer = new ArrayList<DxGeom>(geoms); I do not think it is really needed as the order of the elements in the original array should not matter much for the rest of the code.

ppiastucki avatar Oct 05 '15 15:10 ppiastucki

I think the point that piotr is trying to make is whether the GC problems come from DContacts or maybe something else?

Right now it's not a big problem for me, but, like everything else, it may scale up.

@piotr-piastucki Alright, of course. I don't need 100 contacts between two objects. Currently I use only boxes, so there are 4 contacts per pair.

Anyway, to prevent GC I did this:

private final DContactBufferPool contactBufferPool = new DContactBufferPool(10);

private DNearCallback onCollisionCallback = new DNearCallback() {
    @Override
    public void call(Object data, DGeom o1, DGeom o2) {
        final int N = contactBufferPool.bufferSize;
        ContactBuffer buffer = contactBufferPool.getNext();
        DContactBuffer contacts = buffer.buffer;

        int n = OdeHelper.collide(o1, o2, N, contacts.getGeomBuffer());
        assert buffer.lastUsedContacts == 0;
        buffer.lastUsedContacts = n;

        for (int i = 0; i < n; ++i) {
            DContact contact = contacts.get(i);

            // TODO set up some parameters for this contact

            DJoint c = OdeHelper.createContactJoint(physics, contactGroup, contact);
            c.attach(contact.geom.g1.getBody(), contact.geom.g2.getBody());
        }
    }
};
import org.ode4j.ode.DContact;
import org.ode4j.ode.DContact.DSurfaceParameters;
import org.ode4j.ode.DContactBuffer;
import org.ode4j.ode.DContactGeom;

import com.badlogic.gdx.utils.Array;
import com.badlogic.gdx.utils.Pool;

class DContactBufferPool {
    private Array<ContactBuffer> toBeFreed = new Array<>(ContactBuffer.class);

    private Pool<ContactBuffer> pool = new Pool<ContactBuffer>() {
        @Override
        protected ContactBuffer newObject() {
            return new ContactBuffer(bufferSize);
        }
    };

    public final int bufferSize;

    public DContactBufferPool(int bufferSize) {
        this.bufferSize = bufferSize;
    }

    public final ContactBuffer getNext() {
        ContactBuffer buffer = pool.obtain();
        toBeFreed.add(buffer);
        return buffer;
    }

    public final void freeAll() {
        for (int i = 0, n = toBeFreed.size; i < n; ++i) {
            ContactBuffer buffer = toBeFreed.items[i];
            buffer.nullify();
        }

        pool.freeAll(toBeFreed);
        toBeFreed.size = 0;
    }


    public static class ContactBuffer {
        public final DContactBuffer buffer;
        public int lastUsedContacts;

        public ContactBuffer(int bufferSize) {
            buffer = new DContactBuffer(bufferSize);
        }

        public void nullify() {
            for (int i = 0, n = lastUsedContacts; i < n; ++i) {
                final DContact c = buffer.get(i);
                c.fdir1.setZero();

                DContactGeom g = c.geom;
                g.g1 = g.g2 = null;
                g.side1 = g.side2 = 0;
                g.depth = 0;
                g.normal.setZero();
                g.pos.setZero();

                final DSurfaceParameters p = c.surface;
                p.mode = 0;
                p.mu = 0;
                p.mu2 = 0;
                p.rho = 0;
                p.rho2 = 0;
                p.rhoN = 0;
                p.bounce = 0;
                p.bounce_vel = 0;
                p.soft_erp = 0;
                p.soft_cfm = 0;
                p.motion1 = p.motion2 = p.motionN = 0;
                p.slip1 = p.slip2 = 0;
            }

            lastUsedContacts = 0;
        }
    }
}

Namek avatar Oct 05 '15 19:10 Namek

In profiler I noticed that there's a lot of double[] instantiation, like 3-4 MB/s. Most of it is step() rather than space.collide().

Namek avatar Oct 05 '15 19:10 Namek

Right now it's not a big problem for me, but, like everything else, it may scale up. Alright, of course. I don't need 100 contacts between two objects. Currently I use only boxes, so there are 4 contacts per pair.

If you do not need 100 contacts then do not allocate that many. This way you can easily allocate 1M contacts because it may need to scale sometime :) Follow YAGNI principle here :)

In profiler I noticed that there's a lot of double[] instantiation, like 3-4 MB/s. Most of it is step() rather than space.collide().

This one looks like something worth investigating. But it might be also the case that the way the constraint solver in ode works really requires some space.

ppiastucki avatar Oct 06 '15 08:10 ppiastucki

If you do not need 100 contacts then do not allocate that many. This way you can easily allocate 1M contacts because it may need to scale sometime :) Follow YAGNI principle here :)

Why did you joined these two responses into one? Now you totally misunderstood the part about scaling.

Anyway, thanks for the answers, I have nothing more to add and ask here. Please close the issue if you feel like so.

Namek avatar Oct 06 '15 11:10 Namek

Why did you joined these two responses into one? Now you totally misunderstood the part about scaling.

Let me be more precise then. I am simply trying to help you as I have a feeling that your concerns are based on somewhat incorrect assumptions. Additional contacts do not come for free. Additional contact = additional equation to solve. You do not really want to create 100 contacts per pair. Imagine you have 100 colliding pairs and 100 contacts per pair, you will end up with 10000 constraints to solve. This is an enormous number of calculations and it will most likely slow your application to a crawl on any mobile (and probably desktop too) device regardless of GC. And to solve that many constraints the solver will have to build huge matrices - that's what you also noticed, those double[] arrays in step() method. In most cases to achieve realtime performance you want to either limit the number of pairs (hard) or contacts (easy) to some more reasonable values. I mentioned 10 contacts because this will probably yield results visually indistinguishable from 100 and will allow you to easily speed up the calculations and avoid too much GC overhead at the same time. You can probably lower this value even further with very good results.

ppiastucki avatar Oct 06 '15 11:10 ppiastucki

Please close the issue if you feel like so.

Anyway, thanks for reporting the issue. As I said above, you correctly pointed out a problem with the code, and I intend to fix it, because it is simply dirty code.

My asking for evidence has only two reasons:

  • It allows us to verify that a fix actually improves the situation
  • It may prevent you from getting too optimistic about the results. From what I can see, fixing DContact creation won't help much with the GC problem.

I will probably raise another issue for this. I'll also rename this issue to more clearly specify what was discussed here.

tzaeschke avatar Oct 06 '15 11:10 tzaeschke

After only 5.5 years I implemented a fix for DContact allocation. It does not use pooling but allows reusing the DContactBuffer in the typical nearCallback methods.

tzaeschke avatar Jan 02 '21 16:01 tzaeschke

As you can imagine I am no longer interested in this. Thus, consider closing the issue whenever you feel like it.

Namek avatar Jan 06 '21 04:01 Namek

Yes, I can imagine. Sorry for this long delay. I am currently working on porting all updates from ODE to ode4j, after that I will look into reducing allocations. I don't mind keeping it open (unless you prefer closing it) just to have a ticket to work against.

tzaeschke avatar Jan 06 '21 19:01 tzaeschke

It's finally implemented and merged with PR #71. Apologies for anyone who was waiting for this. Personally, with test (e.g. PerfCards I could not see any noticable GC activity with JDK 8 or later, but there may be GC issues on Android which apparently has worse (none?) "escape analysis" that hepls preventing a lot of GC problems.

tzaeschke avatar May 26 '23 17:05 tzaeschke