lightseq
lightseq copied to clipboard
Question : About construction of total_cache_k, total_cache_v in Transformer
In lightseq/csrc/models/transformer.cu, Should cache_k_out and cache_v_out call set_ancestor? Otherwise why not remove the unused variable cache_k_out and cache_k_out?
Transformer::Transformer {
...
for (auto iter : dec_layer_vec) {
Variable *cache_k = new Variable("cache_k");
Variable *cache_v = new Variable("cache_v");
std::tuple<Variable *, Variable *, Variable *> dec_outs =
(*iter)(dec_emb, total_enc_kv, pad_mask, cache_k, cache_v);
dec_emb = std::get<0>(dec_outs);
Variable *cache_k_out = std::get<1>(dec_outs);
Variable *cache_v_out = std::get<2>(dec_outs);
cache_k->set_ancestor(total_cache_k, cache_size * dec_layer_idx);
cache_v->set_ancestor(total_cache_v, cache_size * dec_layer_idx);
dec_layer_idx++;
}
https://github.com/bytedance/lightseq/blob/2b5592fa658a39a914a5036e665647084d777903/lightseq/csrc/models/transformer.cu#L135
Sorry, this is the design of the new architecture, which uses some fixed syntax to manage GPU memory sharing.
The set_ancestor function is to assign cache_k to a continuous segment in total_cache_k. Specific to the case you gave here, cache_k_out can be removed.
I'll fix this detail in my next commit
thank you for the confirmation, hexisyztem!