MeJerry215 issues

Results 9 issues of


                                            MeJerry215

data and memory中的部分意见

关于此处char 的类型在大多数语言中是 char 可以看做是int8 所以正常来说符号是从 0-127表示的 ASICII码而且是区分 unsigned 和 signed，所以这里直接是2 bytes char 还有默认是unsigned的我是不认同的。 https://github.com/krahets/hello-algo/blob/7ca27c3df1bfc981fc7faa4528dadb410457d221/docs/chapter_data_structure/data_and_memory.md?plain=1#L24 还有此处没有将原码反码补码的情况下，我认为在计算机中存储的就是补码。默认的整数在不区分unsigned 和signed 的情况下我更倾向于signed int32数据类型，则补码表示的负数会比...

en2fr和en2de的模型结构存在差异？

使用`ls_fs_transformer_export.py` 导出en2fr的时候发现缺少layernorm参数 en2de ``` dict_keys(['encoder.embed_tokens.para', 'encoder.layers.0.para', 'encoder.layers.1.para', 'encoder.layers.2.para', 'encoder.layers.3.para', 'encoder.layers.4.para', 'encoder.layers.5.para', 'encoder.layer_norm.weight', 'encoder.layer_norm.bias', 'decoder.embed_tokens.para', 'decoder.layers.0.para', 'decoder.layers.1.para', 'decoder.layers.2.para', 'decoder.layers.3.para', 'decoder.layers.4.para', 'decoder.layers.5.para', 'decoder.layer_norm.weight', 'decoder.layer_norm.bias', 'decoder.output_projection.clip_max']) ``` en2fr ``` dict_keys(['encoder.embed_tokens.para', 'encoder.layers.0.para', 'encoder.layers.1.para',...

有什么方法将fairseq transformer的weight 转换为 lightseq transformer的weight？

在fairseq中的weight都是单个op的weight，而在ligthseq中可能是融合了整层的参数，所以如何将这些整层的参数对应上？让我能够直接用到fairseq中已经训练好的weigth。比如如下是ls中的weight信息 ``` dict_keys(['encoder.embed_tokens.para', 'encoder.layers.0.para', 'encoder.layers.1.para', 'encoder.layers.2.para', 'encoder.layers.3.para', 'encoder.layers.4.para', 'encoder.layers.5.para', 'encoder.layer_norm.weight', 'encoder.layer_norm.bias', 'decoder.embed_tokens.para', 'decoder.layers.0.para', 'decoder.layers.1.para', 'decoder.layers.2.para', 'decoder.layers.3.para', 'decoder.layers.4.para', 'decoder.layers.5.para', 'decoder.layer_norm.weight', 'decoder.layer_norm.bias', 'decoder.output_projection.clip_max']) ``` 选择其中的decoer.layer.5 预期对应的weight应该是 fairseq中的 ``` ['decoder.layers.5.self_attn.in_proj_weight', 'decoder.layers.5.self_attn.in_proj_bias', 'decoder.layers.5.self_attn.out_proj.weight',...

[BUG] 如何使用onnxsim处理多输入，overwrite-input-shape

**Describe the bug** 模型存在两个输入 input_ids, attention_mask 当前尝试使用onnxsim input_model.onnx output_model.onnx --overwrite-input-shape "input_ids:1,128;attention_mask:1,128"，失败尝试使用onnxsim input_model.onnx output_model.onnx --overwrite-input-shape "input_ids:1,128 attention_mask:1,128" 失败尝试使用onnxsim input_model.onnx output_model.onnx --overwrite-input-shape "input_ids:1,128 attention_mask:1,128" 失败 **Model** gpt2 huggingface 导出这个入参如何使用多输入的填写呢？

[BUG Report] wants full padding dot results but got wrong results.

only call dot in a block with M = 16, N = 1, K = 128, PN = 16 since calling `tl.dot` needs M, N, K >=16, padding the N...

[BUG] Triton compile costs too much time.

TestScripts ```python import torch import triton import triton.language as tl import math import sys torch.manual_seed(42) import matplotlib.pyplot as plt import csv import functools import torch import time def median(lst): sorted_lst...

Add Codeshell Support

**hugging face model card**: WisdomShell/CodeShell-7B-Chat **Model Description**: CodeShell is a multi-language code LLM developed by the [Knowledge Computing Lab](http://se.pku.edu.cn/kcl/) of Peking University. CodeShell has 7 billion parameters and was trained...

Why medusa-2 train llama2 with no such great improvement?

In the given examples axoltol [exmaples/medusa](https://github.com/ctlllll/axolotl/tree/main/examples/medusa), I follow the `vicuna_7b_qlora_stage1.yml` and `vicuna_7b_qlora_stage2.yml` to write my llama2 trainning config. Howerver I did't get such greate performance improvement, below is my test...

Does Triton have a roadmap plan?

I would like to know what features Triton will develop and support in the future, but I can't find any information on the homepage.