Results 5 issues of Anderson Meng

### bug描述 Describe the Bug 如下單純的script,帶入int64**最大值**時,會變成**最小值** ```python import paddle; print(paddle.__version__) #2.6.0 t = paddle.zeros((2,2), dtype=paddle.int64) print(paddle.iinfo(paddle.int64).max == 2**63-1) #True t.masked_fill_(paddle.to_tensor([[0,0],[0,1]]), 2**63-1) #

NVIDIA
status/new-issue
type/bug-report

## Container: ``` nvcr.io/nvidia/pytorch:24.05-py3 ``` ## Machine: x86 CPU with A100 node ## Reproduce: ```bash python -m torch.distributed.run --nproc-per-node=2 examples/pytorch/comm_gemm_overlap/ln_mlp_with_overlap.py --num-iters=1000 ``` It'll probably crash or not because at [line...

### 软件环境 ```Markdown - paddlepaddle: N/A - paddlepaddle-gpu: 2.6.1 - paddlenlp: develop ``` ### 重复问题 - [X] I have searched the existing issues ### 错误描述 使用llm/run_pretrain.py嘗試訓練"GPT3-1.3B",初始化模型階段會發生錯誤: ```python init_class = architectures.pop()...

bug
stale

### Description The CuPy codegen will generate a code which cannot be compiled. It looks like variable naming conflict issue. You will observe compiling errors like this ```cpp 7e5a3f4331213c751b3bf407f0a29fd894750a8e.cubin.cu(30): error:...

cat:bug
prio:medium

### Description I've tried many pattern and summarize that: * Fusing `cp.sum` itself works perfect. * Fusing `cp.sum` with an elementwise op like `>` or `+` may cause slowdown because...

cat:performance