Anderson Meng comments

Results 8 comments of


                                            Anderson Meng

`masked_fill_`對int64處理異常，塞入paddle.iinfo(paddle.int64).max會被當作min

類似的問題我也有碰到也很像overflow的問題 ````python import paddle; print(paddle.__version__) #2.6.0 x = paddle.to_tensor([[0, 0], [-2**63, 0]], dtype=paddle.int64) print(x) x == 2**63-1 # __eq__ #Tensor(shape=[2, 2], dtype=bool, place=Place(gpu:0), stop_gradient=True, # [[False, False], # [True...

`masked_fill_`對int64處理異常，塞入paddle.iinfo(paddle.int64).max會被當作min

與百度溝通後，Kai Wang表示現在這個工作沒有排期，NV目前沒有blocking。先close 附帶，對`int64`的操作，可能也有其他API有這種精度溢出問題，比如相等`__eq__`，同樣導致過大/過小的數字會被當成相同的數字 ```python import paddle; print(paddle.__version__) #2.6.0 x = paddle.to_tensor([[0, 0], [-2**63, 0]], dtype=paddle.int64) print(x) x == 2**63-1 # __eq__ #Tensor(shape=[2, 2], dtype=bool, place=Place(gpu:0), stop_gradient=True, # [[False,...

[Bug]: Got Exception during training 'GPT3-1.3B' - TypeError: object of type 'NoneType' has no len()

了解，所以現在 `--model_name_or_path gpt3-1.3B-en`這種寫法是否不支持了呢? **

[Bug]: Got Exception during training 'GPT3-1.3B' - TypeError: object of type 'NoneType' has no len()

@wawltor 感謝你的回覆，然而不幸的是，看起來並非root-cause 仍然有問題我照你的步驟建立 **gpt3-1.3B-en.json** ```json { "model_name_or_path": "gpt3-1.3B-en", "tokenizer_name_or_path": "gpt3-1.3B-en", "input_dir": "/workspace/dataset", "output_dir": "output/paddlenlp_gpt3/debug/model_output", "bf16": true, "sequence_parallel": true, "tensor_parallel_degree": 8, "sharding_parallel_degree": 1, "sharding": "stage2", "pipeline_parallel_degree": 1, "virtual_pp_degree": 1, "pipeline_parallel_config":...

Anderson Meng

`masked_fill_`對int64處理異常，塞入paddle.iinfo(paddle.int64).max會被當作min

`masked_fill_`對int64處理異常，塞入paddle.iinfo(paddle.int64).max會被當作min

[Bug]: Got Exception during training 'GPT3-1.3B' - TypeError: object of type 'NoneType' has no len()

[Bug]: Got Exception during training 'GPT3-1.3B' - TypeError: object of type 'NoneType' has no len()

create_communicator_grouped2 may trigger uninit value memory issue(randomly crash) when you train more iterations.

create_communicator_grouped2 may trigger uninit value memory issue(randomly crash) when you train more iterations.

cupy.fuse slow down due to small grid launched

cupy.fuse slow down due to small grid launched