Anderson Meng issues

Results 5 issues of


                                            Anderson Meng

`masked_fill_`對int64處理異常，塞入paddle.iinfo(paddle.int64).max會被當作min

### bug描述 Describe the Bug 如下單純的script，帶入int64**最大值**時，會變成**最小值** ```python import paddle; print(paddle.__version__) #2.6.0 t = paddle.zeros((2,2), dtype=paddle.int64) print(paddle.iinfo(paddle.int64).max == 2**63-1) #True t.masked_fill_(paddle.to_tensor([[0,0],[0,1]]), 2**63-1) #

NVIDIA

status/new-issue

type/bug-report

create_communicator_grouped2 may trigger uninit value memory issue(randomly crash) when you train more iterations.

## Container: ``` nvcr.io/nvidia/pytorch:24.05-py3 ``` ## Machine: x86 CPU with A100 node ## Reproduce: ```bash python -m torch.distributed.run --nproc-per-node=2 examples/pytorch/comm_gemm_overlap/ln_mlp_with_overlap.py --num-iters=1000 ``` It'll probably crash or not because at [line...

[Bug]: Got Exception during training 'GPT3-1.3B' - TypeError: object of type 'NoneType' has no len()

### 软件环境 ```Markdown - paddlepaddle: N/A - paddlepaddle-gpu: 2.6.1 - paddlenlp: develop ``` ### 重复问题 - [X] I have searched the existing issues ### 错误描述使用llm/run_pretrain.py嘗試訓練"GPT3-1.3B"，初始化模型階段會發生錯誤: ```python init_class = architectures.pop()...

bug

stale

NVRTC_ERROR_COMPILATION when fusing cp.sum with an epilogue of logical op

### Description The CuPy codegen will generate a code which cannot be compiled. It looks like variable naming conflict issue. You will observe compiling errors like this ```cpp 7e5a3f4331213c751b3bf407f0a29fd894750a8e.cubin.cu(30): error:...

cat:bug

prio:medium

cupy.fuse slow down due to small grid launched

### Description I've tried many pattern and summarize that: * Fusing `cp.sum` itself works perfect. * Fusing `cp.sum` with an elementwise op like `>` or `+` may cause slowdown because...

cat:performance