anax Allocate components more efficently

Currently components are allocated with new, which causes the components not to be allocated together.

Dec 25 '13 06:12 miguelmartin75

First thing, thanks for a very nicely written library! Sorry if this is somewhat of a brain dump.

I am right now working on grouping components in containers according to type. Eg., EntityComponentStorage uses one vector of components per their TypeID. This would allow you to reuse the already allocated space, reorganize (sort) components for increased cache coherence etc., and is not too hard to implement. Instead of unique_ptrs, EntityComponentStorage::ImplComponentArray uses indices in order to circumvent pointer invalidation which might happen due to vector resizing on an addComponent operation (or due to sorting after an eg. removeComponent).

With such an implementation the real question is that of API design. At the moment, the (nice and clean) public interface is set in such a way that addComponent returns a pointer to the created component. But these pointers can get invalidated on every add/removeComponent. The user should not hold on to them for too long. Would this be acceptable for the intended use and coding style?

One way around this limitation is to use a deque instead of a vector of components. A deque does not invalidate pointers to its elements, provided erases and inserts are done only at the two ends of the container. (This then rules out any component sorting and allows unused "holes".) Compared to a vector there is a bit of extra complexity involved that can cause cache misses. However, I think could prove to be negligible: after all, a system can consume more than one component type at a time, from various memory locations. (An equivalent option is to use a custom allocator, but these are really not simple to write in a performant and safe manner).

Any thoughts?

Jan 14 '14 14:01 tivek

I am right now working on grouping components in containers according to type. Eg., EntityComponentStorage uses one vector of components per their TypeID. This would allow you to reuse the already allocated space, reorganize (sort) components for increased cache coherence etc., and is not too hard to implement. Instead of unique_ptrs, EntityComponentStorage::ImplComponentArray uses indices in order to circumvent pointer invalidation which might happen due to vector resizing on an addComponent operation (or due to sorting after an eg. removeComponent).

@tivek Someone else is actually trying to solve this issue too. Link: https://github.com/dylanbraithwaite/anax

With such an implementation the real question is that of API design. At the moment, the (nice and clean) public interface is set in such a way that addComponent returns a pointer to the created component. But these pointers can get invalidated on every add/removeComponent. The user should not hold on to them for too long. Would this be acceptable for the intended use and coding style?

Even if the user did not hold it for long, it could still cause errors. As mentioned in this commit for @dylanbraithwaite's fork.

One issue with using a std::vector to hold the components, is that the returned references/pointers will be invalidated when the vector is resized, which I believe is what you mentioned previously. i.e. I believe this would more than likely fail:

auto e1 = world.createEntity();
auto e2 = world.createEntity();

auto pos = e1.addComponent<Position>(1, 3, 4); 
// this line will cause a call to emplace_back if I am correct
// meaning pos will be invalidated
e2.addComponent<Position>(3, 4, 5);

pos.x = 3; // this will cause an error :( (as mentioned above)

One way around this limitation is to use a deque instead of a vector of components. A deque does not invalidate pointers to its elements, provided erases and inserts are done only at the two ends of the container. (This then rules out any component sorting and allows unused "holes".) Compared to a vector there is a bit of extra complexity involved that can cause cache misses. However, I think could prove to be negligible: after all, a system can consume more than one component type at a time, from various memory locations. (An equivalent option is to use a custom allocator, but these are really not simple to write in a performant and safe manner).

Hm, I don't think deque would be a way to go. As there is not a set amount of components that should be pre-allocated (meaning the deque would just allocate lots of different memory blocks, would it not?).

I think the solution to this is to use a "fake smart pointer" to the components. As you have mentioned in your comment. For example:

auto position = e.addComponent<Position>(1, 3, 2);
position->x = 3; // overloaded -> operator to return the component

One BIG problem in my honest opinion, with any solution, is where/how you are storing the components. In @dylanbraithwaite's fork, it is stored as a static variable and it seems that this is the only solution to do so. Thus making this library thread-able may be more difficult to do.

Jan 15 '14 04:01 miguelmartin75

@tivek Someone else is actually trying to solve this issue too. Link: https://github.com/dylanbraithwaite/anax

Ah, cool. Thanks for the headsup.

Hm, I don't think deque would be a way to go. As there is not a set amount of components that should be pre-allocated (meaning the deque would just allocate lots of different memory blocks, would it not?).

Maybe I should have explained a bit more. Internally, deques allocate a block of memory at a time which can hold more than just one element. It's not as good as a vector concerning locality, but much better than using new. Removing components from deque should then only call their destructor, so that their positions can be reused later on and no pointers get invalidated. This might even speed up creation of new components and should improve cache coherence. Problem: std::deque (and boost::deque for that matter) does not allow fine-tuning of the block size. For instance, g++ std::deque defaults to a minimum of 512 bytes per allocation.

On second thought, a block allocator does the same thing as the above deque. http://warp.povusers.org/FSBAllocator/#FSBAllocator seems to be tunable and reasonably thread-safe. I'll play with it as soon as I get the time.

One BIG problem in my honest opinion, with any solution, is where/how you are storing the components. In @dylanbraithwaite's fork, it is stored as a static variable and it seems that this is the only solution to do so. Thus making this library thread-able may be more difficult to do.

I'm not sure I understand: are you saying a static member is the only possible solution, or this particular solution is the only one of its kind?

Jan 15 '14 14:01 tivek

I'm not sure I understand: are you saying a static member is the only possible solution, or this particular solution is the only one of its kind?

I think @miguelishawt was trying to say that its the only possible solution. The issue being that without the container having a non templated base type, there's no way to store SomeContainer<T>s of different Ts without resorting to void*s.

Before realising that vectors are a bad idea because they shift their contents around in memory, the solution I was using was to statically store a vector<T> in Component<T> which could then easily be accessed with Component<SomeType>::vectorOfT or through some abstraction.

Currently I'm thinking about trying some sort of ComponentReference class that could store components as indices to a vector with a queue of unused indices. This way you don't loose the performance of everything being completely contiguous in memory and the indices won't be invalidated when a component is removed because rather than calling vector::erase I can just push the index onto the queue of unused indices.

Jan 15 '14 14:01 dylanbraithwaite

Interesting blog post: http://bannalia.blogspot.com.au/2014/05/fast-polymorphic-collections.html with source code for polymorphic_container https://www.dropbox.com/s/pylp41mg7odrh1c/poly_collection.cpp

The only real downfall is virtual functions are used.

May 04 '14 13:05 miguelmartin75

Isn't that supposed to be optimized away by branch prediction, though?

May 04 '14 17:05 JesseTG

Isn't that supposed to be optimized away by branch prediction, though?

Perhaps, but I don't think that's guaranteed (is it?). Anyhow, I now know that something similar to what I linked is possible without virtual functions (using casts).

May 05 '14 00:05 miguelmartin75

It depends entirely on the hardware--no amount of C/C++ magic can force it, or even query its existence. What is that "something similar", anyway?

EDIT: Oh, I see, there's another blog post

May 05 '14 16:05 JesseTG

I've made a simple library tac to help fix this issue. Basically what is contained within the library is a "type aligned container" (hence tac), which is a container "adapter", which will take a set of types and store a container<T> for each T in the type set. i.e. What it basically does it map <A, B, C> to <container<A>, container<B>, container<C> and puts all these containers within a tuple for easy-access.

I think this would be appropriate for this.

Here are some examples, taken from the README.

tac::type_aligned_container<A, B, C> container;
container.all<A>.emplace_back();
container.all<B>.emplace_back();
container.all<C>.emplace_back();

or...

tac::type_aligned_container_dyn<> container;
container.all<A>.emplace_back();
container.all<B>.emplace_back();
container.all<C>.emplace_back();

Of course I will need to fix #4 (add benchmarks) to see the if there is any good difference in speed with using this over new/delete. Note that the first example requires c++14, because of std::get<T>().

Another issue with this is: you will need to provide the types you want to contain (unless you use type_aligned_container_dyn, then it will lazily add containers for types), and in which case you would need to pass this as a template argument to World. i.e.

struct my_config
{
     using component_types = anax::component_types<Position, Velocity, Etc>;
};
World<my_config> world;

The good thing about this, is we will then know how many components there are (thus no need for dynamic bit sets for components).

Sep 24 '14 11:09 miguelmartin75

Any update on this yet?

Apr 05 '16 00:04 OvermindDL1

anax anax copied to clipboard

Allocate components more efficently

anax
anax copied to clipboard