Kangmo Kim

Results 9 issues of Kangmo Kim

The original GPT-2 code keeps track of the attn output(K,V) and uses it to pass to attn as 'past' argument. The 'past' attn outputs(K,V) are prepended to the K,V from...

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [v ] I am running the latest code. Development is very rapid so there are no...

How about performance measurement with Pytorch/MLIR, Tensorflow/XLA? These have fusing operations to run faster on GPU. We need to compare with MLIR or XLA to get the real comparison on...

The constructor creates new objects without shared_ptrs, but the destructor is empty. In cpp: ``` DecSelfAttentionLayer::DecSelfAttentionLayer( int layer_id, int max_batch_tokens, int max_seq_len, int hidden_size, int num_heads, float attn_prob_dropout_ratio, float hidden_output_dropout_ratio,...

In lightseq/csrc/models/transformer.cu, Should cache_k_out and cache_v_out call set_ancestor? Otherwise why not remove the unused variable cache_k_out and cache_k_out? ``` Transformer::Transformer { ... for (auto iter : dec_layer_vec) { Variable *cache_k...

Hi, thanks for the great project. I use turbo at daily work, but I have to open files that I was using each time I reboot my server. Can we...

Issue 1: When I type a Korean character(two byte width per Korean character), scrollbar is broken. Issue 2: When I type a second Korean character, the first Korean character becomes...

Hi, I was following your project, and I admire your job and continuous effort on this project. In the future, how about creating a plugin system and implementing Github copilot...

It would be great if tvterm can open the same window at the same position and size each time it exits and runs again. Also if possible, supporting multiple profile...