FLAMEGPU2
FLAMEGPU2 copied to clipboard
DAG Control flow errors when abstracting function definitions to separate compilation units / methods
Encountered a bug when working on a non-trivial model, where when using the DAG api (dependsOn
etc) errors would occur when abstracting the definition of agent function behaviours and inclusion in the control flow DAG to mehtods in a separate file which take a reference of the ModelDescription
object. The same abstraciton but using layers behaves fine.
E.g. something along the lines of (untested)
main.cu
// ...
#include "flamegpu/flamegpu.h"
FLAMEGPU_AGENT_FUNCTION(foo, flamegpu::MessageNone, flamegpu::MessageNone) {
// ...
return flamegpu::ALIVE;
}
FLAMEGPU_AGENT_FUNCTION(bar, flamegpu::MessageNone, flamegpu::MessageNone) {
// ...
return flamegpu::ALIVE;
}
int main(int argc, char* argv[]) {
// Define the model, agent and 2 agent funcs
flamegpu::ModelDescription model("model");
flamegpu::AgentDescription agent = model.Agent("agent");
flamegpu::AgentFunctionDescription foo_desc = agent.newFunction("foo", foo);
flamegpu::AgentFunctionDescription bar_desc = agent.newFunction("bar", bar);
// Foo runs first
model.addExecutionRoot(foo_desc);
// bad depends on foo
bar_desc.dependsOn(foo_desc);
// Build the execution graph
model.generateLayers();
// Construct the model.
flamegpu::CUDASimulation simulation(model);
// ...
return 0;
}
Splitting out the agent funciton(s) into methods in a .cu file, with an associated header
other.cuh
#include "flamegpu/flamegpu.h"
namespace other {
void define(flamegpu::ModelDescription& model);
} // namespace other```
`other.cu`
```c++
#include "other.h"
FLAMEGPU_AGENT_FUNCTION(foo, flamegpu::MessageNone, flamegpu::MessageNone) {
// ...
return flamegpu::ALIVE;
}
FLAMEGPU_AGENT_FUNCTION(bar, flamegpu::MessageNone, flamegpu::MessageNone) {
// ...
return flamegpu::ALIVE;
}
namespace other {
void define(flamegpu::ModelDescription& model){
flamegpu::AgentDescription agent = model.Agent("agent");
flamegpu::AgentFunctionDescription foo_desc = agent.newFunction("foo", foo);
flamegpu::AgentFunctionDescription bar_desc = agent.newFunction("bar", bar);
// add to the DAG, ideally in a separate method by getting mutable refs to functions, but error occurred even without that.
// Foo runs first
model.addExecutionRoot(foo_desc);
// bad depends on foo
bar_desc.dependsOn(foo_desc);
}
} // namespace other
main.cu
#include "flamegpu/flamegpu.h"
#include "other.h"
int main(int argc, char* argv[]) {
// Define the model, agent and 2 agent funcs
flamegpu::ModelDescription model("model");
flamegpu::AgentDescription agent = model.newAgent("agent");
other::define(model);
// Build the execution graph
model.generateLayers();
// Construct the model.
flamegpu::CUDASimulation simulation(model);
// ...
return 0;
}
In the separate larger model where this occurred, this resulted in runtime errors under linux (CUDA 12.5, GCC 11) resulted in runtime errors for the split case, while the first case was fine.
The runtime error was:
terminate called after throwing an instance of 'std::bad_array_new_length'
what(): std::bad_array_new_length
Which via gdb had a backtrace pointing at DependencyNode::getDependents
called by DependencyGraph::validateSubTree(DependencyNode* node, std::vector<DependencyNode*>& functionStack)
DependencyNode::dependents
is a std::vector<DependencyNode*> dependents;
but it does not appear to get explicitly initialised anywhere, which may be the problem (or it might not, as a debug build reproduced the error).