FLAMEGPU2
FLAMEGPU2 copied to clipboard
Very wide model support
Allow models with more than 128 wide layers to run without error, by not using a fixed number of CUDAStreamCompactionConfig
.
Closes #727
Todo
- [x] Add tests which expose the current issue when a number of different features are used
- [ ] Malloc a dynamic array based on the width of the widest layer in the model
- [ ]
CUDAScatter
relies onMAX_STREAMS
, so refactoring is required. - [ ]
FLAMEGPUDeviceException
relies onMAX_STREAMS
, so refactoring is required. - [ ] Remove
CUDAScanCompaction::MAX_STREAMS
, replacing with a member variable of the current number allocated. - [ ] Remove/adjust related exceptions (where
CUDAScanCompaction::MAX_STREAMS
is/was checked) - [ ] Test with seatbelts
- [ ] Test without seatbelts
Notes
CUDAScanCompaction::MAX_STREAMS
is hardcoded to 128, the upper limit that can run on a (<= SM75) device at once. This is a bad assumption.
Models can have more than 128 functions per layer, which requires that many streams
CUDAScatter is initialsed as a singleton member of CUDASimulation, so we know the fixed model properties at that point in time, so can add a call to allocate enough data then.
DeviceExceptionManager
has an array of 1 device pointer to a DeviceExceptionBuffer per stream, and host memory to copy that back to. DeviceExceptionManager
is a member of cudaSimulation::singletons
, so can be allocated during singleton initialisaton.
CUDAScanCompaction
is a member variable of CUDAScatter, which is default initialised (rather than being manually constructed or mentioned by an inisialiser list.
This will need to be changed to pass the number of streams to create during conscruction, or to allocate the required number of elements later.
This appeasr to to be the only instatntiations of CUDAScanCompaction
afiak.
Destrcution / deleteion will also be required.