lightseq Question : About construction of total_cache_k, total_cache

Question : About construction of total_cache_k, total_cache_v in Transformer

Open Kangmo opened this issue 2 years ago • 4 comments

In lightseq/csrc/models/transformer.cu, Should cache_k_out and cache_v_out call set_ancestor? Otherwise why not remove the unused variable cache_k_out and cache_k_out?

Transformer::Transformer {
  ...
  for (auto iter : dec_layer_vec) {
    Variable *cache_k = new Variable("cache_k");
    Variable *cache_v = new Variable("cache_v");
    std::tuple<Variable *, Variable *, Variable *> dec_outs =
        (*iter)(dec_emb, total_enc_kv, pad_mask, cache_k, cache_v);
    dec_emb = std::get<0>(dec_outs);
    Variable *cache_k_out = std::get<1>(dec_outs);
    Variable *cache_v_out = std::get<2>(dec_outs);

    cache_k->set_ancestor(total_cache_k, cache_size * dec_layer_idx);
    cache_v->set_ancestor(total_cache_v, cache_size * dec_layer_idx);
    dec_layer_idx++;
  }

https://github.com/bytedance/lightseq/blob/2b5592fa658a39a914a5036e665647084d777903/lightseq/csrc/models/transformer.cu#L135

Dec 26 '22 08:12 Kangmo

Sorry, this is the design of the new architecture, which uses some fixed syntax to manage GPU memory sharing.

Dec 28 '22 09:12 hexisyztem

The set_ancestor function is to assign cache_k to a continuous segment in total_cache_k. Specific to the case you gave here, cache_k_out can be removed.

Dec 28 '22 09:12 hexisyztem

I'll fix this detail in my next commit

Dec 28 '22 09:12 hexisyztem

thank you for the confirmation, hexisyztem!

Jan 05 '23 15:01 Kangmo

lightseq lightseq copied to clipboard

Question : About construction of total_cache_k, total_cache_v in Transformer

lightseq
lightseq copied to clipboard