Jiarui Fang(方佳瑞)

Results 220 comments of Jiarui Fang(方佳瑞)

I wrote a test script to understand Pytorch dynamic quantization. PyTorch version 1.5.0 ``` import torch from torch import nn torch.set_grad_enabled(False) LinearLayer = torch.nn.Linear(20, 30) model = nn.Sequential(LinearLayer) # for...

> > Motivation: we use FBGEMM in order to have consistent accuracy as PyTorch dynamic quantization. > > As TurboTransformer's optimizations are focused on Non-GEMM operations, we can reuse PyTorch...

1. convert your huggingface/tensorflow model to *.npz python tools/convert_huggingface_bert_tf_to_npz.py bert-based-uncased bert_tf.npz 2. update the corresponding line in cpu_example.py ``` tt_model = turbo_transformers.BertModelWithPooler.from_npz( '/workspace/bert_tf.npz', cfg) ```

可以这么理解。但是,我们也转换过tensorflow官方的模型到npz格式。只需要对转换脚本稍作更改即可。

BTW: on my machine, the batch as 9 is split as two minibatches [0:8] [8:9]

GeForce RTX 2060上用TencentPretrain的run_patrickstar.sh跑了500步,对比了一下log。 **PatrickStar** Worker is training ... | 100/ 500 steps| 6164.26 tokens/s| loss 7.15| acc: 0.045 | 200/ 500 steps| 6226.79 tokens/s| loss 6.30| acc: 0.060 | 300/...

当前develop的 CPU embedding实现有问题,在TPT上,出现不收敛现象 "use_cpu_embedding": True 正确收敛,False和结果如下 | 100/ 500 steps| 65949.17 tokens/s| loss 7.15| acc: 0.053 | 200/ 500 steps| 67712.70 tokens/s| loss 6.40| acc: 0.043 | 300/ 500 steps|...

"tie_weights": true不支持 如果用use_cpu_embedding会报错 ![image](https://user-images.githubusercontent.com/5706969/131980651-429bcd31-2042-4abb-8b27-4803f0c4e8aa.png) 如果不用则存在一个参数被复用的情况,触发已知的异常 File "/home/jiaruifang/codes/HybridPS/patrickstar/core/hook.py", line 179, in pre_sub_module_backward_function assert param.ps_attr.bwd_cnt == 0, f"Backward Propagation updates the gradient of a parameter twice. This is not allowed when using...

一个蛋疼的问题,有人可能这样写代码,但是PatrickStar并无法区分weight tensor被两个param共享的情况。 https://git.woa.com/TencentNLP/TencentPretrain/blob/master/tencentpretrain/models/model.py#L21 针对tie weight,即第一层embedding weight和最后一层linear的weight共享参数,目前存在的问题: 1. use_cpu_embedding和tie weight冲突,因为embedding weight在第一层被当成torch param在cpu上计算nn.Embedding,在最后一层却需要在gpu上计算,pre_forward_hook目前无法正确处理。 2. PreprocessCtx构造模型的,chunk-tensor-index包含一个无用的tensor(来自共享后应该删除的tensor)。 3. use_cpu_embedding=False时,收敛性不正确。我不确定现在共享参数的反向传播是否实现正确了。 badcase复现 https://git.woa.com/jiaruifang/TencentPretrain/merge_requests/1

需要研究一下GPU CI怎么搞,好像得用自己的服务器