celerity-runtime icon indicating copy to clipboard operation
celerity-runtime copied to clipboard

Profile and optimize frontend (from CGF submission to serialized graph)

Open PeterTh opened this issue 3 years ago • 0 comments

#107 has frontend benchmarks that do not instantiate the entire runtime.

Low hanging fruit:

  • Currently, 13% of the time is spent in iostreams for creating the CDAG debug labels. Maybe this should happen in graph printing, not generation (there is already a PoC implementation for that)
  • malloc/free pairs have significant impact deep in dependency checking. We can potentially eliminate a lot of convenience allocations (get_accessed_buffers et al)

Requires further investigation:

  • Use a bump allocator for intrusive graphs to improve locality, e.g. a heuristically sized pool per horizon + fallback as needed (maybe also for the dependencies vectors?)
  • command / task map rehashing is also a noticeable factor, but memory is abundant - reserve() them to avoid stalls

PeterTh avatar Mar 14 '22 10:03 PeterTh