Partially forward cgraph
I would like to be able to partially evaluate a cgraph - use case for this is to skip calculating logits for every iteration of an RNN, but do calculate it at the end.
I don't want to create multiple cgraphs for this, because rwkv.cpp requires GGML_MAX_NODES to be increased by a factor of over 1000, which means a single cgraph takes over 2GB of memory. basically the lack of iteration in cgraphs requires me to encode the exact same operation hundreds or even thousands of times into a single graph, and it's not feasible to use multiple graphs for this because each graph has its own thread pool that has to initialize every time ggml_graph_compute is called.
because of this, I'd like to use a single cgraph to its fullest potential. I want to partially evaluate the cgraph or skip particular nodes in order to elide expensive calculations that aren't actually used. ggml doesn't currently have this feature
Hm, this can be done manually by saving and resetting the n_nodes and n_leafs fields of ggml_cgraph. You can chop off trailing computations that aren't needed. It actually seems to work perfectly but maybe ggml should provide some functions for this... or maybe not. Leaving this open just in case