Results 8 comments of Anderson Meng

類似的問題我也有碰到 也很像overflow的問題 ````python import paddle; print(paddle.__version__) #2.6.0 x = paddle.to_tensor([[0, 0], [-2**63, 0]], dtype=paddle.int64) print(x) x == 2**63-1 # __eq__ #Tensor(shape=[2, 2], dtype=bool, place=Place(gpu:0), stop_gradient=True, # [[False, False], # [True...

與百度溝通後,Kai Wang表示現在這個工作沒有排期,NV目前沒有blocking。 先close 附帶,對`int64`的操作,可能也有其他API有這種精度溢出問題, 比如相等`__eq__`,同樣導致過大/過小的數字會被當成相同的數字 ```python import paddle; print(paddle.__version__) #2.6.0 x = paddle.to_tensor([[0, 0], [-2**63, 0]], dtype=paddle.int64) print(x) x == 2**63-1 # __eq__ #Tensor(shape=[2, 2], dtype=bool, place=Place(gpu:0), stop_gradient=True, # [[False,...

了解,所以現在 `--model_name_or_path gpt3-1.3B-en`這種寫法是否不支持了呢? **

@wawltor 感謝你的回覆,然而不幸的是,看起來並非root-cause 仍然有問題 我照你的步驟建立 **gpt3-1.3B-en.json** ```json { "model_name_or_path": "gpt3-1.3B-en", "tokenizer_name_or_path": "gpt3-1.3B-en", "input_dir": "/workspace/dataset", "output_dir": "output/paddlenlp_gpt3/debug/model_output", "bf16": true, "sequence_parallel": true, "tensor_parallel_degree": 8, "sharding_parallel_degree": 1, "sharding": "stage2", "pipeline_parallel_degree": 1, "virtual_pp_degree": 1, "pipeline_parallel_config":...

The commit on this fork will fix this. https://github.com/denera/TransformerEngine/commit/7a9522bdbbe28d2682567ea450f10d87cc68d03a

@asi1024 thanks for helping immediately while I still believe it's worth to dig into it. Noted that if I **don't fuse the elementwise `> 0` op**, the grid it launched...

Also, the small grid issue behaves like https://github.com/cupy/cupy/issues/9005, but [#9005](https://github.com/cupy/cupy/issues/9005) is not a reduction though. When it happened, `grid*block` will be only **2048** in both of the 2 issues.