Jiarui Fang（方佳瑞） comments

Results 220 comments of


                                            Jiarui Fang（方佳瑞）

Developing CPU INT8 quantization

I wrote a test script to understand Pytorch dynamic quantization. PyTorch version 1.5.0 ``` import torch from torch import nn torch.set_grad_enabled(False) LinearLayer = torch.nn.Linear(20, 30) model = nn.Sequential(LinearLayer) # for...

Developing CPU INT8 quantization

> > Motivation: we use FBGEMM in order to have consistent accuracy as PyTorch dynamic quantization. > > As TurboTransformer's optimizations are focused on Non-GEMM operations, we can reuse PyTorch...

只支持huggingface训练的tensorflow模型载入么？

1. convert your huggingface/tensorflow model to *.npz python tools/convert_huggingface_bert_tf_to_npz.py bert-based-uncased bert_tf.npz 2. update the corresponding line in cpu_example.py ``` tt_model = turbo_transformers.BertModelWithPooler.from_npz( '/workspace/bert_tf.npz', cfg) ```

只支持huggingface训练的tensorflow模型载入么？

可以这么理解。但是，我们也转换过tensorflow官方的模型到npz格式。只需要对转换脚本稍作更改即可。

wrong error in getting-started.py

BTW: on my machine, the batch as 9 is split as two minibatches [0:8] [8:9]

支持TencentPretrain

支持TencentPretrain

支持TencentPretrain

"tie_weights": true不支持如果用use_cpu_embedding会报错 ![image](https://user-images.githubusercontent.com/5706969/131980651-429bcd31-2042-4abb-8b27-4803f0c4e8aa.png) 如果不用则存在一个参数被复用的情况，触发已知的异常 File "/home/jiaruifang/codes/HybridPS/patrickstar/core/hook.py", line 179, in pre_sub_module_backward_function assert param.ps_attr.bwd_cnt == 0, f"Backward Propagation updates the gradient of a parameter twice. This is not allowed when using...

支持TencentPretrain

一个蛋疼的问题，有人可能这样写代码，但是PatrickStar并无法区分weight tensor被两个param共享的情况。 https://git.woa.com/TencentNLP/TencentPretrain/blob/master/tencentpretrain/models/model.py#L21 针对tie weight，即第一层embedding weight和最后一层linear的weight共享参数，目前存在的问题： 1. use_cpu_embedding和tie weight冲突，因为embedding weight在第一层被当成torch param在cpu上计算nn.Embedding，在最后一层却需要在gpu上计算，pre_forward_hook目前无法正确处理。 2. PreprocessCtx构造模型的，chunk-tensor-index包含一个无用的tensor（来自共享后应该删除的tensor）。 3. use_cpu_embedding=False时，收敛性不正确。我不确定现在共享参数的反向传播是否实现正确了。 badcase复现 https://git.woa.com/jiaruifang/TencentPretrain/merge_requests/1

Add CI

需要研究一下GPU CI怎么搞，好像得用自己的服务器