Paddle icon indicating copy to clipboard operation
Paddle copied to clipboard

【Hackathon 7th PPSCI No.12】Adam、AdamW 优化器支持 amsgrad -part

Open megemini opened this issue 1 year ago • 46 comments

PR Category

User Experience

PR Types

New features

Description

【Hackathon 7th No.12】Adam、AdamW 优化器支持 amsgrad

关联:

本地对比 pytorch 的结果,两者一致:

比对代码
import numpy as np

import torch
import paddle


def func(t, x):
    if t % 101 == 1:
        return 1010 * x
    else:
        return -10 * x


np.random.seed(2024)
data = np.array(0).astype("float64")
epoch = 500
lr = 0.1

for amsgrad in [True, False]:
    for opt_name, opt_torch, opt_paddle in [
        ["Adam", torch.optim.Adam, paddle.optimizer.Adam],
        ["AdamW", torch.optim.AdamW, paddle.optimizer.AdamW],
    ]:
        for torch_device, paddle_device in [["cpu", "cpu"], ["cuda", "gpu"]]:
            print(f"------ optimizer is : {opt_name} ; compare : {paddle_device}------")
            print(f"------ pytorch ------")
            x = torch.tensor(data, device=torch.device(torch_device))
            x.requires_grad = True

            optimizer = opt_torch([x], lr=lr, amsgrad=amsgrad)
            for i in range(epoch):
                y = func(i, x)
                optimizer.zero_grad()
                y.backward()
                optimizer.step()

            if torch_device == "cuda":
                x_torch = x.cpu().detach().numpy()
                y_torch = y.cpu().detach().numpy()
            else:
                x_torch = x.detach().numpy()
                y_torch = y.detach().numpy()

            print(f"------ paddle ------")
            paddle.set_device(paddle_device)
            x = paddle.to_tensor(data)
            x.stop_gradient = False

            optimizer = opt_paddle(parameters=[x], learning_rate=lr, amsgrad=amsgrad)
            for i in range(epoch):
                y = func(i, x)
                optimizer.clear_grad()
                y.backward()
                optimizer.step()

            x_paddle = x.numpy()
            y_paddle = y.numpy()

            np.testing.assert_allclose(x_torch, x_paddle, atol=1e-06, rtol=1e-06)
            print(x_torch, x_paddle)
            print(y_torch, y_paddle)
            print(f"------- compare finish ---------")

输出结果:

------ optimizer is : Adam ; compare : cpu------
------ pytorch ------
------ paddle ------
0.382819332566745 0.3828193325667452
-3.7319234136114865 -3.7319234136114887
------- compare finish ---------
------ optimizer is : Adam ; compare : gpu------
------ pytorch ------
------ paddle ------
0.3828193325667449 0.38281933256674533
-3.7319234136114856 -3.73192341361149
------- compare finish ---------
------ optimizer is : AdamW ; compare : cpu------
------ pytorch ------
------ paddle ------
0.38940724227589385 0.389407242265435
-3.801604114817793 -3.8016041146280424
------- compare finish ---------
------ optimizer is : AdamW ; compare : gpu------
------ pytorch ------
------ paddle ------
0.38940724227589385 0.3894072422654346
-3.801604114817793 -3.801604114628038
------- compare finish ---------
------ optimizer is : Adam ; compare : cpu------
------ pytorch ------
------ paddle ------
0.47233193956960806 0.47233193956960845
-4.62253146676283 -4.622531466762833
------- compare finish ---------
------ optimizer is : Adam ; compare : gpu------
------ pytorch ------
------ paddle ------
0.472331939569608 0.4723319395696082
-4.62253146676283 -4.6225314667628306
------- compare finish ---------
------ optimizer is : AdamW ; compare : cpu------
------ pytorch ------
------ paddle ------
0.462192080569021 0.46219208087997216
-4.525658535292251 -4.525658538303618
------- compare finish ---------
------ optimizer is : AdamW ; compare : gpu------
------ pytorch ------
------ paddle ------
0.46219208056902106 0.46219208087997266
-4.525658535292251 -4.525658538303623
------- compare finish ---------

Update 20240908

  • 已在本地完成:

    • test_adam_op.py
    • test_adamw_op.py
    • test_merged_adam_op.py
    • test_fused_adam_op.py

    相关测试。

  • 需要在 CI 环境中验证分布式的测试项目

  • 需要在 CI 环境中验证其他测试项目

另外,xpu 的 amsgrad 变体,由于 xpu 底层接口暂不支持,因此,此处只修改了相关的输入输出参数列表。

megemini avatar Sep 08 '24 09:09 megemini

你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot[bot] avatar Sep 08 '24 09:09 paddle-bot[bot]

添加的ams_grad是否会影响原有的代码执行逻辑和存储空间占用情况?PR的代码起来无论是否开启ams_grad,都会比原先没有amsgrad的代码多申请一段mom2_max的空间,以及有一些多余的变量产生。

这个之前考虑过,主要是因为,目前涉及到 amsgrad 的地方太多了,所以优化相关的事情想先往后放一下 ~

那我现在改一下试试吧 ~

megemini avatar Sep 09 '24 13:09 megemini

添加的ams_grad是否会影响原有的代码执行逻辑和存储空间占用情况?PR的代码起来无论是否开启ams_grad,都会比原先没有amsgrad的代码多申请一段mom2_max的空间,以及有一些多余的变量产生。

这个之前考虑过,主要是因为,目前涉及到 amsgrad 的地方太多了,所以优化相关的事情想先往后放一下 ~

那我现在改一下试试吧 ~

  1. 这一点影响是比较大的。因为一般情况下优化器是逐元素跟踪参数状态,所以优化器每一个统计量需要记录的数量都等于模型参数数量,adam(w)这种带动量的优化器则更会多。因此模型训练过程中显存占比前三就是中间状态、优化器参数、模型参数,如果没有优化,很可能原先在16G上能训的下的CV、NLP模型就会OOM了,更不用说B级别参数量的大模型

  2. 代码本身的计算逻辑应该没太大问题,目前没有优化的版本可以用于快速验证正确性,但最终版本一定要考虑到这种基本但必要的优化

HydrogenSulfate avatar Sep 09 '24 14:09 HydrogenSulfate

另外可以在修改完成后,用ResNet50或者其他模型,以fake data为输入做一个对比,确认下amsgrad关闭时,显存无变化,开启时显存增加量与参数量基本相同。

HydrogenSulfate avatar Sep 09 '24 14:09 HydrogenSulfate

Update 20240911

进行以下测试:

  • 测试新代码(带有 amsgrad 选项)与旧代码(不带有 amsgrad 选项)
  • 测试开启 amsgrad 与不开启 amsgrad

测试环境:

  • 新代码:本机测试
  • 旧代码:AIStudio 中测试(安装最新的开发版本)
测试代码
import argparse
import numpy as np
import paddle
import paddle.nn as nn
import paddle.vision.models as models
from paddle.io import DataLoader


def main(amsgrad=False, use_gpu=False, model="resnet50"):
    print("-" * 30)
    print("amsgrad is:", amsgrad)

    paddle.set_device("gpu" if use_gpu else "cpu")
    paddle.seed(2024)

    class RandomDataset(paddle.io.Dataset):
        def __init__(self, size, image_size, num_classes=1000):
            self.size = size
            self.image_size = image_size
            self.num_classes = num_classes

        def __getitem__(self, idx):
            image = np.random.random((3, self.image_size, self.image_size)).astype(
                "float32"
            )
            label = np.random.randint(0, self.num_classes, (1,)).astype("int64")
            return image, label

        def __len__(self):
            return self.size

    random_dataset = RandomDataset(size=4, image_size=32)

    _model = getattr(models, model)
    model = _model(pretrained=False, num_classes=1000)

    optimizer = paddle.optimizer.Adam(
        parameters=model.parameters(), learning_rate=0.001, amsgrad=amsgrad
    )
    loss_fn = nn.CrossEntropyLoss()
    train_loader = DataLoader(dataset=random_dataset, batch_size=1, shuffle=True)

    def train(model, train_loader, loss_fn, optimizer, epochs):
        model.train()
        for epoch in range(epochs):
            for batch_id, (data, label) in enumerate(train_loader):
                pred = model(data)
                loss = loss_fn(pred, label)
                loss.backward()
                optimizer.step()
                optimizer.clear_grad()

                mem_usage = paddle.device.cuda.max_memory_allocated()
                print(
                    f"Epoch [{epoch+1}/{epochs}], Batch [{batch_id}], Loss: {loss.numpy()}, Memory Usage: {mem_usage / (1024 ** 2):.2f} MB"
                )

    train(model, train_loader, loss_fn, optimizer, 2)


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--amsgrad", action="store_true")
    parser.add_argument("--gpu", action="store_false")
    parser.add_argument(
        "--model", type=str, default="resnet50"  # resnet50, resnet101, resnet152,
    )
    args = parser.parse_args()
    return args


if __name__ == "__main__":
    args = parse_args()
    main(args.amsgrad, args.gpu, args.model)

测试命令:

  • 使用 resnet50 ,不开启 amsgrad: python -m xxx --model=resnet50
  • 使用 resnet50 ,开启 amsgrad: python -m xxx --model=resnet50 --amsgrad

测试结果:

  1. 测试新代码(带有 amsgrad 选项)与旧代码(不带有 amsgrad 选项)

本地新代码,带有 amsgrad 进行编译,不开启 amsgrad:

> python -m test_amsgrad_memory --model=resnet50
Epoch [1/2], Batch [0], Loss: 9.837594985961914, Memory Usage: 390.19 MB
Epoch [1/2], Batch [1], Loss: 50.349605560302734, Memory Usage: 425.99 MB
Epoch [1/2], Batch [2], Loss: 39.871864318847656, Memory Usage: 425.99 MB
Epoch [1/2], Batch [3], Loss: 38.12334060668945, Memory Usage: 425.99 MB
Epoch [2/2], Batch [0], Loss: 26.828706741333008, Memory Usage: 425.99 MB
Epoch [2/2], Batch [1], Loss: 15.12156867980957, Memory Usage: 425.99 MB
Epoch [2/2], Batch [2], Loss: 15.311683654785156, Memory Usage: 425.99 MB
Epoch [2/2], Batch [3], Loss: 19.22699546813965, Memory Usage: 425.99 MB

> python -m test_amsgrad_memory --model=resnet152
Epoch [1/2], Batch [0], Loss: 16.58477020263672, Memory Usage: 919.06 MB
Epoch [1/2], Batch [1], Loss: 142.8964385986328, Memory Usage: 957.03 MB
Epoch [1/2], Batch [2], Loss: 102.4380874633789, Memory Usage: 957.03 MB
Epoch [1/2], Batch [3], Loss: 70.08514404296875, Memory Usage: 957.03 MB
Epoch [2/2], Batch [0], Loss: 75.25007629394531, Memory Usage: 957.03 MB
Epoch [2/2], Batch [1], Loss: 56.891502380371094, Memory Usage: 957.03 MB
Epoch [2/2], Batch [2], Loss: 45.01842498779297, Memory Usage: 957.03 MB
Epoch [2/2], Batch [3], Loss: 75.26055145263672, Memory Usage: 957.03 MB

旧代码,不带有 amsgrad 进行编译:

> python -m test_amsgrad --model=resnet50
Epoch [1/2], Batch [0], Loss: 12.602409362792969, Memory Usage: 390.19 MB
Epoch [1/2], Batch [1], Loss: 60.275970458984375, Memory Usage: 416.99 MB
Epoch [1/2], Batch [2], Loss: 40.492130279541016, Memory Usage: 416.99 MB
Epoch [1/2], Batch [3], Loss: 36.41865539550781, Memory Usage: 416.99 MB
Epoch [2/2], Batch [0], Loss: 16.618688583374023, Memory Usage: 416.99 MB
Epoch [2/2], Batch [1], Loss: 17.91885757446289, Memory Usage: 416.99 MB
Epoch [2/2], Batch [2], Loss: 23.867103576660156, Memory Usage: 416.99 MB
Epoch [2/2], Batch [3], Loss: 21.98102569580078, Memory Usage: 416.99 MB

> python -m test_amsgrad --model=resnet152
Epoch [1/2], Batch [0], Loss: 33.26136016845703, Memory Usage: 919.06 MB
Epoch [1/2], Batch [1], Loss: 142.78451538085938, Memory Usage: 948.02 MB
Epoch [1/2], Batch [2], Loss: 79.83582305908203, Memory Usage: 948.02 MB
Epoch [1/2], Batch [3], Loss: 72.10752868652344, Memory Usage: 948.02 MB
Epoch [2/2], Batch [0], Loss: 62.952781677246094, Memory Usage: 948.02 MB
Epoch [2/2], Batch [1], Loss: 79.0680923461914, Memory Usage: 948.02 MB
Epoch [2/2], Batch [2], Loss: 45.06647491455078, Memory Usage: 948.02 MB
Epoch [2/2], Batch [3], Loss: 42.24040603637695, Memory Usage: 948.02 MB

结论:

  • 两者第一个 batch 占用显存相同(符合预期)
  • 第二个 batch 会比第一个 batch 多占用显存(应该与 amsgrad 无关)
  • 两个 epoch 之间不会有显存溢出(符合预期)
  • 编译带有 amsgrad 的代码,resnet50 与 resnet152 相同,占用多 9 MB 左右(amsgrad 的额外开销)
  1. 测试开启 amsgrad 与不开启 amsgrad

本地新代码,带有 amsgrad 进行编译,开启 amsgrad:

> python -m test_amsgrad_memory --model=resnet50 --amsgrad
Epoch [1/2], Batch [0], Loss: 8.899900436401367, Memory Usage: 487.68 MB
Epoch [1/2], Batch [1], Loss: 52.00892639160156, Memory Usage: 523.49 MB
Epoch [1/2], Batch [2], Loss: 33.351043701171875, Memory Usage: 523.49 MB
Epoch [1/2], Batch [3], Loss: 30.31154441833496, Memory Usage: 523.49 MB
Epoch [2/2], Batch [0], Loss: 32.98215866088867, Memory Usage: 523.49 MB
Epoch [2/2], Batch [1], Loss: 29.391761779785156, Memory Usage: 523.49 MB
Epoch [2/2], Batch [2], Loss: 17.727962493896484, Memory Usage: 523.49 MB
Epoch [2/2], Batch [3], Loss: 20.467811584472656, Memory Usage: 523.49 MB

> python -m test_amsgrad_memory --model=resnet152 --amsgrad
Epoch [1/2], Batch [0], Loss: 17.665447235107422, Memory Usage: 1148.68 MB
Epoch [1/2], Batch [1], Loss: 109.74750518798828, Memory Usage: 1186.64 MB
Epoch [1/2], Batch [2], Loss: 86.66461944580078, Memory Usage: 1186.64 MB
Epoch [1/2], Batch [3], Loss: 40.293758392333984, Memory Usage: 1186.64 MB
Epoch [2/2], Batch [0], Loss: 45.22898864746094, Memory Usage: 1186.64 MB
Epoch [2/2], Batch [1], Loss: 70.95489501953125, Memory Usage: 1186.64 MB
Epoch [2/2], Batch [2], Loss: 33.824127197265625, Memory Usage: 1186.64 MB
Epoch [2/2], Batch [3], Loss: 41.39281463623047, Memory Usage: 1186.64 MB

结论:

相比之前不开启 amsgrad

  • 开启 amsgrad ,resnet50 多占用 97.5 MB,resenet152 多占用 229.62 MB

与模型大小成正比增加

megemini avatar Sep 11 '24 08:09 megemini

Update 20240911

进行以下测试:

  • 测试新代码(带有 amsgrad 选项)与旧代码(不带有 amsgrad 选项)
  • 测试开启 amsgrad 与不开启 amsgrad

测试环境:

  • 新代码:本机测试
  • 旧代码:AIStudio 中测试(安装最新的开发版本)

测试代码 测试命令:

  • 使用 resnet50 ,不开启 amsgrad: python -m xxx --model=resnet50
  • 使用 resnet50 ,开启 amsgrad: python -m xxx --model=resnet50 --amsgrad

测试结果:

  1. 测试新代码(带有 amsgrad 选项)与旧代码(不带有 amsgrad 选项)

本地新代码,带有 amsgrad 进行编译,不开启 amsgrad:

> python -m test_amsgrad_memory --model=resnet50
Epoch [1/2], Batch [0], Loss: 9.837594985961914, Memory Usage: 390.19 MB
Epoch [1/2], Batch [1], Loss: 50.349605560302734, Memory Usage: 425.99 MB
Epoch [1/2], Batch [2], Loss: 39.871864318847656, Memory Usage: 425.99 MB
Epoch [1/2], Batch [3], Loss: 38.12334060668945, Memory Usage: 425.99 MB
Epoch [2/2], Batch [0], Loss: 26.828706741333008, Memory Usage: 425.99 MB
Epoch [2/2], Batch [1], Loss: 15.12156867980957, Memory Usage: 425.99 MB
Epoch [2/2], Batch [2], Loss: 15.311683654785156, Memory Usage: 425.99 MB
Epoch [2/2], Batch [3], Loss: 19.22699546813965, Memory Usage: 425.99 MB

> python -m test_amsgrad_memory --model=resnet152
Epoch [1/2], Batch [0], Loss: 16.58477020263672, Memory Usage: 919.06 MB
Epoch [1/2], Batch [1], Loss: 142.8964385986328, Memory Usage: 957.03 MB
Epoch [1/2], Batch [2], Loss: 102.4380874633789, Memory Usage: 957.03 MB
Epoch [1/2], Batch [3], Loss: 70.08514404296875, Memory Usage: 957.03 MB
Epoch [2/2], Batch [0], Loss: 75.25007629394531, Memory Usage: 957.03 MB
Epoch [2/2], Batch [1], Loss: 56.891502380371094, Memory Usage: 957.03 MB
Epoch [2/2], Batch [2], Loss: 45.01842498779297, Memory Usage: 957.03 MB
Epoch [2/2], Batch [3], Loss: 75.26055145263672, Memory Usage: 957.03 MB

旧代码,不带有 amsgrad 进行编译:

> python -m test_amsgrad --model=resnet50
Epoch [1/2], Batch [0], Loss: 12.602409362792969, Memory Usage: 390.19 MB
Epoch [1/2], Batch [1], Loss: 60.275970458984375, Memory Usage: 416.99 MB
Epoch [1/2], Batch [2], Loss: 40.492130279541016, Memory Usage: 416.99 MB
Epoch [1/2], Batch [3], Loss: 36.41865539550781, Memory Usage: 416.99 MB
Epoch [2/2], Batch [0], Loss: 16.618688583374023, Memory Usage: 416.99 MB
Epoch [2/2], Batch [1], Loss: 17.91885757446289, Memory Usage: 416.99 MB
Epoch [2/2], Batch [2], Loss: 23.867103576660156, Memory Usage: 416.99 MB
Epoch [2/2], Batch [3], Loss: 21.98102569580078, Memory Usage: 416.99 MB

> python -m test_amsgrad --model=resnet152
Epoch [1/2], Batch [0], Loss: 33.26136016845703, Memory Usage: 919.06 MB
Epoch [1/2], Batch [1], Loss: 142.78451538085938, Memory Usage: 948.02 MB
Epoch [1/2], Batch [2], Loss: 79.83582305908203, Memory Usage: 948.02 MB
Epoch [1/2], Batch [3], Loss: 72.10752868652344, Memory Usage: 948.02 MB
Epoch [2/2], Batch [0], Loss: 62.952781677246094, Memory Usage: 948.02 MB
Epoch [2/2], Batch [1], Loss: 79.0680923461914, Memory Usage: 948.02 MB
Epoch [2/2], Batch [2], Loss: 45.06647491455078, Memory Usage: 948.02 MB
Epoch [2/2], Batch [3], Loss: 42.24040603637695, Memory Usage: 948.02 MB

结论:

  • 两者第一个 batch 占用显存相同(符合预期)
  • 第二个 batch 会比第一个 batch 多占用显存(应该与 amsgrad 无关)
  • 两个 epoch 之间不会有显存溢出(符合预期)
  • 编译带有 amsgrad 的代码,resnet50 与 resnet152 相同,占用多 9 MB 左右(amsgrad 的额外开销)
  1. 测试开启 amsgrad 与不开启 amsgrad

本地新代码,带有 amsgrad 进行编译,开启 amsgrad:

> python -m test_amsgrad_memory --model=resnet50 --amsgrad
Epoch [1/2], Batch [0], Loss: 8.899900436401367, Memory Usage: 487.68 MB
Epoch [1/2], Batch [1], Loss: 52.00892639160156, Memory Usage: 523.49 MB
Epoch [1/2], Batch [2], Loss: 33.351043701171875, Memory Usage: 523.49 MB
Epoch [1/2], Batch [3], Loss: 30.31154441833496, Memory Usage: 523.49 MB
Epoch [2/2], Batch [0], Loss: 32.98215866088867, Memory Usage: 523.49 MB
Epoch [2/2], Batch [1], Loss: 29.391761779785156, Memory Usage: 523.49 MB
Epoch [2/2], Batch [2], Loss: 17.727962493896484, Memory Usage: 523.49 MB
Epoch [2/2], Batch [3], Loss: 20.467811584472656, Memory Usage: 523.49 MB

> python -m test_amsgrad_memory --model=resnet152 --amsgrad
Epoch [1/2], Batch [0], Loss: 17.665447235107422, Memory Usage: 1148.68 MB
Epoch [1/2], Batch [1], Loss: 109.74750518798828, Memory Usage: 1186.64 MB
Epoch [1/2], Batch [2], Loss: 86.66461944580078, Memory Usage: 1186.64 MB
Epoch [1/2], Batch [3], Loss: 40.293758392333984, Memory Usage: 1186.64 MB
Epoch [2/2], Batch [0], Loss: 45.22898864746094, Memory Usage: 1186.64 MB
Epoch [2/2], Batch [1], Loss: 70.95489501953125, Memory Usage: 1186.64 MB
Epoch [2/2], Batch [2], Loss: 33.824127197265625, Memory Usage: 1186.64 MB
Epoch [2/2], Batch [3], Loss: 41.39281463623047, Memory Usage: 1186.64 MB

结论:

相比之前不开启 amsgrad

  • 开启 amsgrad ,resnet50 多占用 97.5 MB,resenet152 多占用 229.62 MB

与模型大小成正比增加

  1. 第二个batch显存达到峰值是符合预期的
  2. 开启amsgrad,resnet50和resnet152分别多占用了97.5MB和229.62MB也是符合预期的,因为两者的参数量换算成MB刚好是97.69和230.19,代码如下
import paddle
import numpy as np

x = paddle.vision.models.resnet50()

num_params = 0
for k, v in x.named_parameters():
    num_params += np.product(list(v.shape))
print(num_params * 4 / (1<<20))

x = paddle.vision.models.resnet152()

num_params = 0
for k, v in x.named_parameters():
    num_params += np.product(list(v.shape))
print(num_params * 4 / (1<<20))
  1. 可以确认下这多出来的9MB是什么原因造成的

HydrogenSulfate avatar Sep 11 '24 08:09 HydrogenSulfate

目前有个问题,paddle/phi/infermeta/multiary.h 和 paddle/phi/infermeta/multiary.cc 这两个文件中,不能使用 paddle::optional 的 moment2_max,如

void AdamInferMeta(const MetaTensor& param,
                   const MetaTensor& grad,
                   const MetaTensor& learning_rate,
                   const MetaTensor& moment1,
                   const MetaTensor& moment2,
                   const MetaTensor& moment2_max,
                   const MetaTensor& beta1_pow,
                   const MetaTensor& beta2_pow,
                   const MetaTensor& master_param,
                   const MetaTensor& skip_update,
                   const Scalar& beta1,
                   const Scalar& beta2,
                   const Scalar& epsilon,
                   bool lazy_mode,
                   int64_t min_row_size_to_use_multithread,
                   bool multi_precision,
                   bool use_global_beta_pow,
                   bool amsgrad,
                   MetaTensor* param_out,
                   MetaTensor* moment1_out,
                   MetaTensor* moment2_out,
                   MetaTensor* moment2_max_out,
                   MetaTensor* beta1_pow_out,
                   MetaTensor* beta2_pow_out,
                   MetaTensor* master_param_outs)

改为

void AdamInferMeta(const MetaTensor& param,
                   const MetaTensor& grad,
                   const MetaTensor& learning_rate,
                   const MetaTensor& moment1,
                   const MetaTensor& moment2,
                   const paddle::optional<MetaTensor>& moment2_max,
                   const MetaTensor& beta1_pow,
                   const MetaTensor& beta2_pow,
                   const MetaTensor& master_param,
                   const MetaTensor& skip_update,
                   const Scalar& beta1,
                   const Scalar& beta2,
                   const Scalar& epsilon,
                   bool lazy_mode,
                   int64_t min_row_size_to_use_multithread,
                   bool multi_precision,
                   bool use_global_beta_pow,
                   bool amsgrad,
                   MetaTensor* param_out,
                   MetaTensor* moment1_out,
                   MetaTensor* moment2_out,
                   MetaTensor* moment2_max_out,
                   MetaTensor* beta1_pow_out,
                   MetaTensor* beta2_pow_out,
                   MetaTensor* master_param_outs)

否则编译出错:

    pir/dialect/CMakeFiles/op_dialect.dir/operator/ir/pd_op2.cc.o
    [ 70%] Building CXX object paddle/fluid/pir/dialect/CMakeFiles/op_dialect.dir/operator/ir/pd_op3.cc.o
    [ 70%] Building CXX object paddle/fluid/pir/dialect/CMakeFiles/op_dialect.dir/operator/ir/pd_op4.cc.o
    In file included from /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/fluid/pir/dialect/operator/interface/infermeta.h:17,
                    from /home/shun/Documents/Projects/paddle/megemini/Paddle/build/paddle/fluid/pir/dialect/operator/ir/pd_op.h:9,
                    from /home/shun/Documents/Projects/paddle/megemini/Paddle/build/paddle/fluid/pir/dialect/operator/ir/pd_op1.cc:2:
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h: In instantiation of ‘static void phi::InferMetaFnImpl<Return (*)(Args ...), infer_meta_fn>::InferMetaFnCallHelper<const phi::MetaTensor&, Tail ...>::Call(phi::InferMetaContext*, PreviousArgs& ...) [with int in_idx = 4; int attr_idx = 0; int out_idx = 0; PreviousArgs = {const phi::MetaTensor, const phi::MetaTensor, const phi::MetaTensor, const phi::MetaTensor}; Tail = {const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::InferMetaTypeTag<int>}; Return = void; Args = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*}; Return (* infer_meta_fn)(Args ...) = phi::AdamInferMeta]’:
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h:203:65:   recursively required from ‘static void phi::InferMetaFnImpl<Return (*)(Args ...), infer_meta_fn>::InferMetaFnCallHelper<const phi::MetaTensor&, Tail ...>::Call(phi::InferMetaContext*, PreviousArgs& ...) [with int in_idx = 1; int attr_idx = 0; int out_idx = 0; PreviousArgs = {const phi::MetaTensor}; Tail = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::InferMetaTypeTag<int>}; Return = void; Args = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*}; Return (* infer_meta_fn)(Args ...) = phi::AdamInferMeta]’
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h:203:65:   required from ‘static void phi::InferMetaFnImpl<Return (*)(Args ...), infer_meta_fn>::InferMetaFnCallHelper<const phi::MetaTensor&, Tail ...>::Call(phi::InferMetaContext*, PreviousArgs& ...) [with int in_idx = 0; int attr_idx = 0; int out_idx = 0; PreviousArgs = {}; Tail = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::InferMetaTypeTag<int>}; Return = void; Args = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*}; Return (* infer_meta_fn)(Args ...) = phi::AdamInferMeta]’
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h:185:73:   required from ‘static void phi::InferMetaFnImpl<Return (*)(Args ...), infer_meta_fn>::Call(phi::InferMetaContext*) [with Return = void; Args = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*}; Return (* infer_meta_fn)(Args ...) = phi::AdamInferMeta]’
    /home/shun/Documents/Projects/paddle/megemini/Paddle/build/paddle/fluid/pir/dialect/operator/ir/pd_op1.cc:1976:13:   required from here
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h:203:65: error: incomplete type ‘phi::InferMetaFnImpl<void (*)(const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*), phi::AdamInferMeta>::InferMetaFnCallHelper<const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::InferMetaTypeTag<int> >’ used in nested name specifier
    202 |       InferMetaFnCallHelper<
        |       ~~~~~~~~~~~~~~~~~~~~~~                                     
    203 |           Tail...>::template Call<in_idx + 1, attr_idx, out_idx>(ctx,
        |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
    204 |                                                                  pargs...,
        |                                                                  ~~~~~~~~~
    205 |                                                                  arg);
        |                                                                  ~~~~
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h: In instantiation of ‘static void phi::InferMetaFnImpl<Return (*)(Args ...), infer_meta_fn>::InferMetaFnCallHelper<const phi::MetaTensor&, Tail ...>::Call(phi::InferMetaContext*, PreviousArgs& ...) [with int in_idx = 4; int attr_idx = 0; int out_idx = 0; PreviousArgs = {const phi::MetaTensor, const phi::MetaTensor, const phi::MetaTensor, const phi::MetaTensor}; Tail = {const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::InferMetaTypeTag<int>}; Return = void; Args = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*}; Return (* infer_meta_fn)(Args ...) = phi::AdamwInferMeta]’:
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h:203:65:   recursively required from ‘static void phi::InferMetaFnImpl<Return (*)(Args ...), infer_meta_fn>::InferMetaFnCallHelper<const phi::MetaTensor&, Tail ...>::Call(phi::InferMetaContext*, PreviousArgs& ...) [with int in_idx = 1; int attr_idx = 0; int out_idx = 0; PreviousArgs = {const phi::MetaTensor}; Tail = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::InferMetaTypeTag<int>}; Return = void; Args = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*}; Return (* infer_meta_fn)(Args ...) = phi::AdamwInferMeta]’
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h:203:65:   required from ‘static void phi::InferMetaFnImpl<Return (*)(Args ...), infer_meta_fn>::InferMetaFnCallHelper<const phi::MetaTensor&, Tail ...>::Call(phi::InferMetaContext*, PreviousArgs& ...) [with int in_idx = 0; int attr_idx = 0; int out_idx = 0; PreviousArgs = {}; Tail = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::InferMetaTypeTag<int>}; Return = void; Args = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*}; Return (* infer_meta_fn)(Args ...) = phi::AdamwInferMeta]’
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h:185:73:   required from ‘static void phi::InferMetaFnImpl<Return (*)(Args ...), infer_meta_fn>::Call(phi::InferMetaContext*) [with Return = void; Args = {const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*}; Return (* infer_meta_fn)(Args ...) = phi::AdamwInferMeta]’
    /home/shun/Documents/Projects/paddle/megemini/Paddle/build/paddle/fluid/pir/dialect/operator/ir/pd_op1.cc:3484:13:   required from here
    /home/shun/Documents/Projects/paddle/megemini/Paddle/paddle/phi/core/infermeta_utils.h:203:65: error: incomplete type ‘phi::InferMetaFnImpl<void (*)(const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*), phi::AdamwInferMeta>::InferMetaFnCallHelper<const paddle::optional<phi::MetaTensor>&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const phi::MetaTensor&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, const paddle::experimental::ScalarBase<phi::DenseTensor>&, float, float, bool, bool, long int, bool, bool, bool, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::MetaTensor*, phi::InferMetaTypeTag<int> >’ used in nested name specifier
    make[2]: *** [paddle/fluid/pir/dialect/CMakeFiles/op_dialect.dir/build.make:552: paddle/fluid/pir/dialect/CMakeFiles/op_dialect.dir/operator/ir/pd_op1.cc.o] Error 1
    make[2]: *** Waiting for unfinished jobs....
    make[1]: *** [CMakeFiles/Makefile2:41340: paddle/fluid/pir/dialect/CMakeFiles/op_dialect.dir/all] Error 2
    make: *** [Makefile:136: all] Error 2

这是循环依赖了?

可以确认下这多出来的9MB是什么原因造成的

嗯,这两天找找看 ~

另外,这两天 CI 有问题,我这边看不到 fail 的原因,目前本地:

  • test_adam_op.py
  • test_adamw_op.py
  • test_merged_adam_op.py
  • test_fused_adam_op.py

这四个已测试通过,看看还有啥其他地方有问题?

megemini avatar Sep 11 '24 09:09 megemini

第二个batch显存达到峰值是符合预期的

想要请教一下,这个都涉及到哪些地方?谢谢!!!

目前是第一个 batch 是一致的,第二个 batch 会多出来 9 MB ... ...


找到一篇文章 https://pytorch.org/blog/understanding-gpu-memory-1/?hss_channel=lcp-78618366

megemini avatar Sep 11 '24 09:09 megemini

第二个batch显存达到峰值是符合预期的

想要请教一下,这个都涉及到哪些地方?谢谢!!!

目前是第一个 batch 是一致的,第二个 batch 会多出来 9 MB ... ...

找到一篇文章 https://pytorch.org/blog/understanding-gpu-memory-1/?hss_channel=lcp-78618366

我忘记这个结论是哪里的看到的了,但是这篇文章应该也可以参考下

HydrogenSulfate avatar Sep 11 '24 11:09 HydrogenSulfate

@HydrogenSulfate 这多出来的9MB 破案了,应该是环境的原因,显卡或者说 cuda cudnn 版本不同 ~

我在本机也安装了最新的开发版,跑出来的显存跟之前编译过 amsgrad 的一样:

> python -m test_amsgrad_memory --model=resnet50
------------------------------
amsgrad is: False
W0911 21:42:50.726809 121043 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.
W0911 21:42:50.726830 121043 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.7
W0911 21:42:50.727402 121043 gpu_resources.cc:164] device: 0, cuDNN Version: 8.5.
W0911 21:42:50.858465 121043 gpu_resources.cc:299] WARNING: device: 0. The installed Paddle is compiled with CUDNN 8.6, but CUDNN version in your machine is 8.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
/home/shun/venv38dev/lib/python3.8/site-packages/paddle/nn/layer/norm.py:788: UserWarning: When training, we now always track global mean and variance.
  warnings.warn(
Epoch [1/2], Batch [0], Loss: 0.0, Memory Usage: 390.19 MB
Epoch [1/2], Batch [1], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [1/2], Batch [2], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [1/2], Batch [3], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [2/2], Batch [0], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [2/2], Batch [1], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [2/2], Batch [2], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [2/2], Batch [3], Loss: 0.0, Memory Usage: 425.99 MB

显存占用是一样的,都是第一个 batch 390.19 MB ,后面是 425.99 MB ~

我这边的环境是:

Paddle version: 0.0.0 Paddle With CUDA: True

OS: ubuntu 22.04 GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.28.1 Libc version: glibc 2.35 Python version: 3.8.17

CUDA version: 11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0 cuDNN version: 8.5.0 Nvidia driver version: 535.54.03 Nvidia driver List: GPU 0: NVIDIA P106-100

不过,这开发版的 Adam 貌似有问题啊,为啥 loss 都是 0 ,难道是上面的提示说版本不兼容导致的??? ~~~ 🤣🤣🤣

我用这个命令安装的:python -m pip install --force-reinstall paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html

megemini avatar Sep 11 '24 13:09 megemini

@HydrogenSulfate 这多出来的9MB 破案了,应该是环境的原因,显卡或者说 cuda cudnn 版本不同 ~

我在本机也安装了最新的开发版,跑出来的显存跟之前编译过 amsgrad 的一样:

> python -m test_amsgrad_memory --model=resnet50
------------------------------
amsgrad is: False
W0911 21:42:50.726809 121043 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.
W0911 21:42:50.726830 121043 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.7
W0911 21:42:50.727402 121043 gpu_resources.cc:164] device: 0, cuDNN Version: 8.5.
W0911 21:42:50.858465 121043 gpu_resources.cc:299] WARNING: device: 0. The installed Paddle is compiled with CUDNN 8.6, but CUDNN version in your machine is 8.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
/home/shun/venv38dev/lib/python3.8/site-packages/paddle/nn/layer/norm.py:788: UserWarning: When training, we now always track global mean and variance.
  warnings.warn(
Epoch [1/2], Batch [0], Loss: 0.0, Memory Usage: 390.19 MB
Epoch [1/2], Batch [1], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [1/2], Batch [2], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [1/2], Batch [3], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [2/2], Batch [0], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [2/2], Batch [1], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [2/2], Batch [2], Loss: 0.0, Memory Usage: 425.99 MB
Epoch [2/2], Batch [3], Loss: 0.0, Memory Usage: 425.99 MB

显存占用是一样的,都是第一个 batch 390.19 MB ,后面是 425.99 MB ~

我这边的环境是:

Paddle version: 0.0.0 Paddle With CUDA: True

OS: ubuntu 22.04 GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.28.1 Libc version: glibc 2.35 Python version: 3.8.17

CUDA version: 11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0 cuDNN version: 8.5.0 Nvidia driver version: 535.54.03 Nvidia driver List: GPU 0: NVIDIA P106-100

不过,这开发版的 Adam 貌似有问题啊,为啥 loss 都是 0 ,难道是上面的提示说版本不兼容导致的??? ~~~ 🤣🤣🤣

我用这个命令安装的:python -m pip install --force-reinstall paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html

  1. 这个确实有可能跟cuda有关,所以测试的时候尽量保证相同软硬件环境,控制变量,这个9MB问题就先忽略好了
  2. 全0的BUG输出之前出现过,在2.6.1+某种环境下,不只是adam,而是任意的API结果都是全0张量,你可以尝试下

HydrogenSulfate avatar Sep 11 '24 14:09 HydrogenSulfate

全0的BUG输出之前出现过,在2.6.1+某种环境下,不只是adam,而是任意的API结果都是全0张量,你可以尝试下

In [1]: import paddle

In [2]: a = paddle.to_tensor(123)

In [3]: a
Out[3]: 
Tensor(shape=[], dtype=int64, place=Place(gpu:0), stop_gradient=True,
       123)

In [4]: b = paddle.to_tensor(33)

In [5]: a + b
W0911 22:43:57.961004 130469 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.
W0911 22:43:57.961030 130469 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.7
W0911 22:43:57.961791 130469 gpu_resources.cc:164] device: 0, cuDNN Version: 8.5.
Out[5]: 
Tensor(shape=[], dtype=int64, place=Place(gpu:0), stop_gradient=True,
       0)

🫠🫠🫠

megemini avatar Sep 11 '24 14:09 megemini

全0的BUG输出之前出现过,在2.6.1+某种环境下,不只是adam,而是任意的API结果都是全0张量,你可以尝试下

In [1]: import paddle

In [2]: a = paddle.to_tensor(123)

In [3]: a
Out[3]: 
Tensor(shape=[], dtype=int64, place=Place(gpu:0), stop_gradient=True,
       123)

In [4]: b = paddle.to_tensor(33)

In [5]: a + b
W0911 22:43:57.961004 130469 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.
W0911 22:43:57.961030 130469 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.7
W0911 22:43:57.961791 130469 gpu_resources.cc:164] device: 0, cuDNN Version: 8.5.
Out[5]: 
Tensor(shape=[], dtype=int64, place=Place(gpu:0), stop_gradient=True,
       0)

🫠🫠🫠

The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.,架构不兼容?

HydrogenSulfate avatar Sep 11 '24 14:09 HydrogenSulfate

The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.,架构不兼容?

嗯,有可能 ~ 之前都是在本地编译安装的,没啥问题 ~ 可能官网提供的 dev 版本的编译环境跟我这儿不一样 ~~~

megemini avatar Sep 11 '24 15:09 megemini

@megemini 有一些CI挂了,包括.cc文件的单测,以及codestyle,还麻烦看一下

HydrogenSulfate avatar Sep 18 '24 04:09 HydrogenSulfate

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 18 '24 10:09 CLAassistant

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Sep 18 '24 10:09 CLAassistant

@megemini 有一些CI挂了,包括.cc文件的单测,以及codestyle,还麻烦看一下

我这里带有测试的编译始终有问题,与这个 PR 无关,可能是环境有问题

https://github.com/PaddlePaddle/Paddle/issues/66683 https://github.com/PaddlePaddle/Paddle/issues/65250

类似这里面的现象 ~

我先 commit 一下修改,看看 CI 的结果吧 ~

megemini avatar Sep 18 '24 10:09 megemini

@megemini 有一些CI挂了,包括.cc文件的单测,以及codestyle,还麻烦看一下

我这里带有测试的编译始终有问题,与这个 PR 无关,可能是环境有问题

#66683 #65250

类似这里面的现象 ~

我先 commit 一下修改,看看 CI 的结果吧 ~

个别的CI可能是机器问题,但有的我看应该是某些单测没有对应修改,导致报错的

HydrogenSulfate avatar Sep 18 '24 10:09 HydrogenSulfate

个别的CI可能是机器问题,但有的我看应该是某些单测没有对应修改,导致报错的

嗯,重点看了一下 PR-CI-Py3 ,里面大部分的单测我这边是没问题的,对应 PR-CI-Windows 里面的单测也是 PASS 状态 ~

个别的会报错

E       AssertionError: In PaddlePaddle 2.x, we turn on dynamic graph mode by default, and 'data()' is only supported in static graph mode. So if you want to use this api, please call 'paddle.enable_static()' before this api to enter static graph mode.

跟这个 PR 没啥关系,我把对应文件的 paddle.enable_static() 先加上 ~

test.cc 里面报错的问题我再定位一下 ~

megemini avatar Sep 18 '24 13:09 megemini

个别的CI可能是机器问题,但有的我看应该是某些单测没有对应修改,导致报错的

嗯,重点看了一下 PR-CI-Py3 ,里面大部分的单测我这边是没问题的,对应 PR-CI-Windows 里面的单测也是 PASS 状态 ~

个别的会报错

E       AssertionError: In PaddlePaddle 2.x, we turn on dynamic graph mode by default, and 'data()' is only supported in static graph mode. So if you want to use this api, please call 'paddle.enable_static()' before this api to enter static graph mode.

跟这个 PR 没啥关系,我把对应文件的 paddle.enable_static() 先加上 ~

test.cc 里面报错的问题我再定位一下 ~

单测里的动态/静态图建议使用xx_guard上下文,这样保证不会受其它单测干扰,也不会干扰其他单测。 https://github.com/PaddlePaddle/Paddle/pull/67927/files#diff-44e6332e36d910b34f32a4156199974fd81120211e5ee4460ae7e106f966913aR17-R36

HydrogenSulfate avatar Sep 18 '24 13:09 HydrogenSulfate

@HydrogenSulfate 今天我在 docker 里面重新编译了一遍,带有测试项,主要的几个测试都没啥问题,比如 jit_kernel_test test_adam_op test_adamw_op test_merged_adam_op,其中的 test.cc 也没啥问题,测试结果如下:

测试结果

➜  build git:(hack7_amsgrad) ctest -R jit_kernel_test -V
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
Test project /paddle/Paddle/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 8
    Start 8: jit_kernel_test

8: Test command: /paddle/Paddle/build/test/cpp/jit_kernel_test
8: Environment variables: 
8:  FLAGS_init_allocated_mem=true
8:  FLAGS_cudnn_deterministic=true
8:  LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/paddle/Paddle/build/python/paddle/libs:/paddle/Paddle/build/python/paddle/base
8: Test timeout computed to be: 10000000
8: [==========] Running 45 tests from 4 test cases.
8: [----------] Global test environment set-up.
8: [----------] 4 tests from JITKernel_pool
8: [ RUN      ] JITKernel_pool.jitcreator
8: [       OK ] JITKernel_pool.jitcreator (0 ms)
8: [ RUN      ] JITKernel_pool.jitpool
8: [       OK ] JITKernel_pool.jitpool (0 ms)
8: [ RUN      ] JITKernel_pool.more
8: [       OK ] JITKernel_pool.more (0 ms)
8: [ RUN      ] JITKernel_pool.refer
8: [       OK ] JITKernel_pool.refer (0 ms)
8: [----------] 4 tests from JITKernel_pool (0 ms total)
8: 
8: [----------] 6 tests from JITKernel_helper
8: [ RUN      ] JITKernel_helper.GetAllCandidateKernels
8: [       OK ] JITKernel_helper.GetAllCandidateKernels (0 ms)
8: [ RUN      ] JITKernel_helper.GetAllCandidateFuncsWithTypes
8: [       OK ] JITKernel_helper.GetAllCandidateFuncsWithTypes (0 ms)
8: [ RUN      ] JITKernel_helper.KernelFuncs
8: [       OK ] JITKernel_helper.KernelFuncs (0 ms)
8: [ RUN      ] JITKernel_helper.GetAllCandidateFuncs
8: [       OK ] JITKernel_helper.GetAllCandidateFuncs (4 ms)
8: [ RUN      ] JITKernel_helper.pack_weights
8: [       OK ] JITKernel_helper.pack_weights (0 ms)
8: [ RUN      ] JITKernel_helper.attr
8: [       OK ] JITKernel_helper.attr (0 ms)
8: [----------] 6 tests from JITKernel_helper (4 ms total)
8: 
8: [----------] 8 tests from JITKernel_key
8: [ RUN      ] JITKernel_key.int
8: [       OK ] JITKernel_key.int (0 ms)
8: [ RUN      ] JITKernel_key.gru
8: [       OK ] JITKernel_key.gru (0 ms)
8: [ RUN      ] JITKernel_key.lstm
8: [       OK ] JITKernel_key.lstm (0 ms)
8: [ RUN      ] JITKernel_key.seq_pool
8: [       OK ] JITKernel_key.seq_pool (0 ms)
8: [ RUN      ] JITKernel_key.matmul
8: [       OK ] JITKernel_key.matmul (0 ms)
8: [ RUN      ] JITKernel_key.emb_seq_pool
8: [       OK ] JITKernel_key.emb_seq_pool (0 ms)
8: [ RUN      ] JITKernel_key.adam
8: [       OK ] JITKernel_key.adam (0 ms)
8: [ RUN      ] JITKernel_key.sgd
8: [       OK ] JITKernel_key.sgd (0 ms)
8: [----------] 8 tests from JITKernel_key (0 ms total)
8: 
8: [----------] 27 tests from JITKernel
8: [ RUN      ] JITKernel.VMul
8: [       OK ] JITKernel.VMul (3 ms)
8: [ RUN      ] JITKernel.VAdd
8: [       OK ] JITKernel.VAdd (2 ms)
8: [ RUN      ] JITKernel.VAddRelu
8: [       OK ] JITKernel.VAddRelu (1 ms)
8: [ RUN      ] JITKernel.VSub
8: [       OK ] JITKernel.VSub (1 ms)
8: [ RUN      ] JITKernel.VScal
8: [       OK ] JITKernel.VScal (1 ms)
8: [ RUN      ] JITKernel.VAddBias
8: [       OK ] JITKernel.VAddBias (0 ms)
8: [ RUN      ] JITKernel.VRelu
8: [       OK ] JITKernel.VRelu (2 ms)
8: [ RUN      ] JITKernel.VIdentity
8: [       OK ] JITKernel.VIdentity (1 ms)
8: [ RUN      ] JITKernel.VSquare
8: [       OK ] JITKernel.VSquare (1 ms)
8: [ RUN      ] JITKernel.VExp
8: [       OK ] JITKernel.VExp (1 ms)
8: [ RUN      ] JITKernel.VSigmoid
8: [       OK ] JITKernel.VSigmoid (2 ms)
8: [ RUN      ] JITKernel.VTanh
8: [       OK ] JITKernel.VTanh (3 ms)
8: [ RUN      ] JITKernel.VCopy
8: [       OK ] JITKernel.VCopy (1 ms)
8: [ RUN      ] JITKernel.LSTMCtHt
8: [       OK ] JITKernel.LSTMCtHt (301 ms)
8: [ RUN      ] JITKernel.LSTMC1H1
8: [       OK ] JITKernel.LSTMC1H1 (259 ms)
8: [ RUN      ] JITKernel.GRUH1
8: [       OK ] JITKernel.GRUH1 (17 ms)
8: [ RUN      ] JITKernel.GRUHtPart1
8: [       OK ] JITKernel.GRUHtPart1 (14 ms)
8: [ RUN      ] JITKernel.GRUHtPart2
8: [       OK ] JITKernel.GRUHtPart2 (17 ms)
8: [ RUN      ] JITKernel.LayerNorm
8: [       OK ] JITKernel.LayerNorm (193 ms)
8: [ RUN      ] JITKernel.CRFDecoding
8: [       OK ] JITKernel.CRFDecoding (436 ms)
8: [ RUN      ] JITKernel.SeqPool
8: [       OK ] JITKernel.SeqPool (390 ms)
8: [ RUN      ] JITKernel.EmbSeqPool
8: [       OK ] JITKernel.EmbSeqPool (382 ms)
8: [ RUN      ] JITKernel.MatMul
8: [       OK ] JITKernel.MatMul (9 ms)
8: [ RUN      ] JITKernel.Adam
8: [       OK ] JITKernel.Adam (0 ms)
8: [ RUN      ] JITKernel.AdamW
8: [       OK ] JITKernel.AdamW (0 ms)
8: [ RUN      ] JITKernel.Sgd
8: [       OK ] JITKernel.Sgd (56 ms)
8: [ RUN      ] JITKernel.VBroadcast
8: [       OK ] JITKernel.VBroadcast (5 ms)
8: [----------] 27 tests from JITKernel (2098 ms total)
8: 
8: [----------] Global test environment tear-down
8: [==========] 45 tests from 4 test cases ran. (2102 ms total)
8: [  PASSED  ] 45 tests.
1/1 Test #8: jit_kernel_test ..................   Passed    2.24 sec

The following tests passed:
	jit_kernel_test

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   2.35 sec

➜  build git:(hack7_amsgrad) ctest -R test_adam_op -V   
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
Test project /paddle/Paddle/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 956
    Start  956: test_adam_op

956: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/paddle/Paddle/build/python" "/usr/bin/python" "/paddle/Paddle/tools/test_runner.py" "test_adam_op"
956: Test timeout computed to be: 10000000
956: I0919 09:39:58.425536 86223 program_interpreter.cc:243] New Executor is Running.
956: W0919 09:39:58.425767 86223 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.8
956: W0919 09:39:58.426263 86223 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
956: I0919 09:39:58.447311 86223 pir_interpreter.cc:1454] New Executor is Running ...
956: I0919 09:39:58.447784 86223 pir_interpreter.cc:1480] pir interpreter is running by multi-thread mode ...
956: E0919 09:39:58.452252 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.500211 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.540009 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.581257 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.619745 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.660209 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.702069 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.744038 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.786046 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.826704 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.849367 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.872129 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.895104 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.918649 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.945003 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.968045 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:58.990718 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.013746 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.034998 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.075783 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.096769 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.120254 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.142637 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.164650 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.188405 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.211796 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.237112 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.263793 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.286584 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: [2024-09-19 09:39:59,515] [ WARNING] backward_utils.py:647 - input provided by inputs has no use
956: E0919 09:39:59.632916 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.674681 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.716568 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: E0919 09:39:59.757982 86223 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
956: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
956:   warnings.warn(
956: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
956:   warnings.warn(
956: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
956:   warnings.warn(
956: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
956:   warnings.warn(
1/4 Test  #956: test_adam_op .....................   Passed    2.72 sec
test 957
    Start  957: test_adam_optimizer_fp32_fp64

957: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/paddle/Paddle/build/python" "/usr/bin/python" "/paddle/Paddle/tools/test_runner.py" "test_adam_optimizer_fp32_fp64"
957: Test timeout computed to be: 120
957: /paddle/Paddle/build/test/legacy_test/test_adam_optimizer_fp32_fp64.py:56: VisibleDeprecationWarning: 
957: Warning:
957: API "paddle.dataset.uci_housing.train" is deprecated since 2.0.0, and will be removed in future versions. Please use "paddle.text.datasets.UCIHousing" instead.
957:     Reason: Please use new dataset API which supports paddle.io.DataLoader 
957:   paddle.dataset.uci_housing.train(), batch_size=1
957: Cache file /root/.cache/paddle/dataset/uci_housing/housing.data not found, downloading http://paddlemodels.bj.bcebos.com/uci_housing/housing.data 
957: Begin to download
item 12/12 [==========================>...] - ETA: 0s - 3ms/item  
957: Download finished
957: I0919 09:40:01.298322 86843 program_interpreter.cc:243] New Executor is Running.
957: W0919 09:40:01.298630 86843 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.8
957: W0919 09:40:01.299270 86843 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
957: I0919 09:40:01.393777 86843 interpreter_util.cc:647] Standalone Executor is Used.
2/4 Test  #957: test_adam_optimizer_fp32_fp64 ....   Passed    1.27 sec
test 2004
    Start 2004: test_adam_op_multi_thread

2004: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/paddle/Paddle/build/python" "FLAGS_inner_op_parallelism=4" "/usr/bin/python" "/paddle/Paddle/tools/test_runner.py" "test_adam_op"
2004: Test timeout computed to be: 10000000
2004: I0919 09:40:02.374812 86876 program_interpreter.cc:243] New Executor is Running.
2004: W0919 09:40:02.375027 86876 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.8
2004: W0919 09:40:02.375511 86876 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
2004: I0919 09:40:02.394155 86876 pir_interpreter.cc:1454] New Executor is Running ...
2004: I0919 09:40:02.394641 86876 pir_interpreter.cc:1480] pir interpreter is running by multi-thread mode ...
2004: E0919 09:40:02.399149 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.446254 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.486897 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.525270 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.563928 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.603085 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.642763 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.682237 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.722621 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.760040 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.781384 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.804320 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.825884 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.850793 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.872227 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.895184 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.917002 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.938360 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.959641 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:02.998135 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.021133 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.044593 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.066599 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.089313 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.112676 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.136301 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.158834 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.182197 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.209053 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: [2024-09-19 09:40:03,297] [ WARNING] backward_utils.py:647 - input provided by inputs has no use
2004: E0919 09:40:03.385520 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.454360 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.495410 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: E0919 09:40:03.537671 86876 pir.cc:2522] The op: pd_op.adam_ does not implement InferSymbolicShapeInterface.
2004: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
2004:   warnings.warn(
2004: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
2004:   warnings.warn(
2004: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
2004:   warnings.warn(
2004: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
2004:   warnings.warn(
3/4 Test #2004: test_adam_op_multi_thread ........   Passed    2.39 sec
test 2124
    Start 2124: test_adam_op_deprecated

2124: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/paddle/Paddle/build/python" "FLAGS_enable_pir_api=0" "/usr/bin/python" "/paddle/Paddle/tools/test_runner.py" "test_adam_op_deprecated"
2124: Test timeout computed to be: 10000000
2124: I0919 09:40:04.725281 87496 program_interpreter.cc:243] New Executor is Running.
2124: I0919 09:40:04.735670 87496 interpreter_util.cc:647] Standalone Executor is Used.
4/4 Test #2124: test_adam_op_deprecated ..........   Passed    0.77 sec

The following tests passed:
	test_adam_op
	test_adam_optimizer_fp32_fp64
	test_adam_op_multi_thread
	test_adam_op_deprecated

100% tests passed, 0 tests failed out of 4

Total Test time (real) =   7.22 sec


➜  build git:(hack7_amsgrad) ctest -R test_adamw_op -V
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
Test project /paddle/Paddle/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 960
    Start  960: test_adamw_op

960: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/paddle/Paddle/build/python" "/usr/bin/python" "/paddle/Paddle/tools/test_runner.py" "test_adamw_op"
960: Environment variables: 
960:  FLAGS_PIR_OPTEST_RELAX_CHECK=True
960: Test timeout computed to be: 10000000
960: I0919 09:41:03.279420 87554 program_interpreter.cc:243] New Executor is Running.
960: W0919 09:41:03.279640 87554 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.8
960: W0919 09:41:03.280115 87554 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
960: I0919 09:41:03.299233 87554 pir_interpreter.cc:1454] New Executor is Running ...
960: I0919 09:41:03.299702 87554 pir_interpreter.cc:1480] pir interpreter is running by multi-thread mode ...
960: E0919 09:41:03.304289 87554 pir.cc:2522] The op: pd_op.adamw_ does not implement InferSymbolicShapeInterface.
960: E0919 09:41:03.349586 87554 pir.cc:2522] The op: pd_op.adamw_ does not implement InferSymbolicShapeInterface.
960: E0919 09:41:03.385483 87554 pir.cc:2522] The op: pd_op.adamw_ does not implement InferSymbolicShapeInterface.
960: E0919 09:41:03.424723 87554 pir.cc:2522] The op: pd_op.adamw_ does not implement InferSymbolicShapeInterface.
960: I0919 09:41:03.470851 87554 interpreter_util.cc:647] Standalone Executor is Used.
960: [2024-09-19 09:41:03,547] [ WARNING] backward_utils.py:647 - input provided by inputs has no use
960: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
960:   warnings.warn(
1/2 Test  #960: test_adamw_op ....................   Passed    1.84 sec
test 1959
    Start 1959: test_adamw_op_static_build

1959: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/paddle/Paddle/build/python" "FLAGS_new_executor_static_build=true" "/usr/bin/python" "/paddle/Paddle/tools/test_runner.py" "test_adamw_op"
1959: Environment variables: 
1959:  FLAGS_PIR_OPTEST_RELAX_CHECK=True
1959: Test timeout computed to be: 10000000
1959: I0919 09:41:05.103775 87766 program_interpreter.cc:243] New Executor is Running.
1959: W0919 09:41:05.104004 87766 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.8
1959: W0919 09:41:05.104579 87766 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
1959: I0919 09:41:05.125249 87766 pir_interpreter.cc:1454] New Executor is Running ...
1959: I0919 09:41:05.125720 87766 pir_interpreter.cc:1480] pir interpreter is running by multi-thread mode ...
1959: E0919 09:41:05.130336 87766 pir.cc:2522] The op: pd_op.adamw_ does not implement InferSymbolicShapeInterface.
1959: E0919 09:41:05.177286 87766 pir.cc:2522] The op: pd_op.adamw_ does not implement InferSymbolicShapeInterface.
1959: E0919 09:41:05.218026 87766 pir.cc:2522] The op: pd_op.adamw_ does not implement InferSymbolicShapeInterface.
1959: E0919 09:41:05.259145 87766 pir.cc:2522] The op: pd_op.adamw_ does not implement InferSymbolicShapeInterface.
1959: I0919 09:41:05.302271 87766 interpreter_util.cc:647] Standalone Executor is Used.
1959: [2024-09-19 09:41:05,377] [ WARNING] backward_utils.py:647 - input provided by inputs has no use
1959: /paddle/Paddle/build/python/paddle/amp/auto_cast.py:631: UserWarning: For float16, amp only support NVIDIA GPU with Compute Capability 7.0 or higher, current GPU is: NVIDIA P106-100, with Compute Capability: 6.1.
1959:   warnings.warn(
2/2 Test #1959: test_adamw_op_static_build .......   Passed    1.81 sec

The following tests passed:
	test_adamw_op
	test_adamw_op_static_build

100% tests passed, 0 tests failed out of 2

Total Test time (real) =   3.71 sec

➜  build git:(hack7_amsgrad) ctest -R test_merged_adam_op -V
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
Test project /paddle/Paddle/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1564
    Start 1564: test_merged_adam_op

1564: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/paddle/Paddle/build/python" "/usr/bin/python" "/paddle/Paddle/tools/test_runner.py" "test_merged_adam_op"
1564: Test timeout computed to be: 10000000
1564: W0919 09:41:34.062079 88134 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.8
1564: W0919 09:41:34.062589 88134 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
1/2 Test #1564: test_merged_adam_op ................   Passed    0.84 sec
test 1988
    Start 1988: test_merged_adam_op_static_build

1988: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/paddle/Paddle/build/python" "FLAGS_new_executor_static_build=true" "/usr/bin/python" "/paddle/Paddle/tools/test_runner.py" "test_merged_adam_op"
1988: Test timeout computed to be: 10000000
1988: W0919 09:41:34.891319 88158 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.8
1988: W0919 09:41:34.891844 88158 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
2/2 Test #1988: test_merged_adam_op_static_build ...   Passed    0.83 sec

The following tests passed:
	test_merged_adam_op
	test_merged_adam_op_static_build

100% tests passed, 0 tests failed out of 2

Total Test time (real) =   1.73 sec


结合目前的 CI 结果看,mac 和 win 上面的都没啥问题,是不是 PR-CI-Py3 相关的 CI 环境(如 PR-CI-Static-Check PR-CI-Coverage)有问题?

另外,PR-CI-Kunlun-R200 这个 CI 的 xpu 相关的测试失败,是不是跟底层接口有关系?另外两个 PR-CI-Kunlun-bxcheck 也没啥问题 ~ 这个 xpu 我这边没法验证,还请帮忙看看 ~ 感谢!!!🙏🙏🙏

megemini avatar Sep 19 '24 09:09 megemini

@HydrogenSulfate 今天我在 docker 里面重新编译了一遍,带有测试项,主要的几个测试都没啥问题,比如 jit_kernel_test test_adam_op test_adamw_op test_merged_adam_op,其中的 test.cc 也没啥问题,测试结果如下:

测试结果 结合目前的 CI 结果看,mac 和 win 上面的都没啥问题,是不是 PR-CI-Py3 相关的 CI 环境(如 PR-CI-Static-Check PR-CI-Coverage)有问题?

另外,PR-CI-Kunlun-R200 这个 CI 的 xpu 相关的测试失败,是不是跟底层接口有关系?另外两个 PR-CI-Kunlun-bxcheck 也没啥问题 ~ 这个 xpu 我这边没法验证,还请帮忙看看 ~ 感谢!!!🙏🙏🙏

  1. 看了一下CI,xpu和PY3以及PY3PIR,好像遇到了段错误导致大量单测报错,根据我的经验,可以确认一下你加的代码里,是否有某些你添加的数据指针,访问前确保其被分配了内存。
  2. 是否可以用aistudio的环境编译一下测测看? image

HydrogenSulfate avatar Sep 19 '24 11:09 HydrogenSulfate

  1. 看了一下CI,xpu和PY3以及PY3PIR,好像遇到了段错误导致大量单测报错,根据我的经验,可以确认一下你加的代码里,是否有某些你添加的数据指针,访问前确保其被分配了内存。

嗯,这里面很多类似错误,可我咋感觉不是咱们这个 PR 引入的,比如 test_imperative_qat_fuse ,这里面没多少东西,也跟 adam 没啥关系,但是也出现类似错误

2024-09-19 00:39:49 49/75 Test #1839: test_imperative_qat_fuse ...........................***Failed    1.93 sec
2024-09-19 00:39:49 Hint: Your machine support AVX, but the installed paddlepaddle doesn't have avx core. Hence, no-avx core with worse performance will be imported.
2024-09-19 00:39:49 If you like, you could reinstall paddlepaddle by 'python -m pip install --force-reinstall paddlepaddle-gpu[==version]' to get better performance.
2024-09-19 00:39:49 /workspace/Paddle/build/test/quantization/test_imperative_qat.py:109: *******: [93m
2024-09-19 00:39:49 Warning:
2024-09-19 00:39:49 API "paddle.dataset.mnist.train" is deprecated since 2.0.0, and will be removed in future versions. Please use "paddle.vision.datasets.MNIST" instead.
2024-09-19 00:39:49     Reason: Please use new dataset API which supports paddle.io.DataLoader [0m
2024-09-19 00:39:49   paddle.dataset.mnist.train(), batch_size=32, drop_last=True
2024-09-19 00:39:49 /workspace/Paddle/build/test/quantization/test_imperative_qat.py:112: *******: [93m
2024-09-19 00:39:49 Warning:
2024-09-19 00:39:49 API "paddle.dataset.mnist.test" is deprecated since 2.0.0, and will be removed in future versions. Please use "paddle.vision.datasets.MNIST" instead.
2024-09-19 00:39:49     Reason: Please use new dataset API which supports paddle.io.DataLoader [0m
2024-09-19 00:39:49   paddle.dataset.mnist.test(), batch_size=32
2024-09-19 00:39:49 /usr/local/lib/python3.9/dist-packages/paddle/nn/layer/norm.py:818: UserWarning: When training, we now always track global mean and variance.
2024-09-19 00:39:49   warnings.warn(
2024-09-19 00:39:49 
2024-09-19 00:39:49 
2024-09-19 00:39:49 --------------------------------------
2024-09-19 00:39:49 C++ Traceback (most recent call last):
2024-09-19 00:39:49 --------------------------------------
2024-09-19 00:39:49 No stack trace in paddle, may be caused by external reasons.
2024-09-19 00:39:49 
2024-09-19 00:39:49 ----------------------
2024-09-19 00:39:49 Error Message Summary:
2024-09-19 00:39:49 ----------------------
2024-09-19 00:39:49 FatalError: `Segmentation fault` is detected by the operating system.
2024-09-19 00:39:49   [TimeInfo: *** Aborted at 1726677589 (unix time) try "date -d @1726677589" if you are using GNU date ***]
2024-09-19 00:39:49   [SignalInfo: *** SIGSEGV (@0x0) received by PID 15150 (TID 0x7f72f17d5740) from PID 0 ***]
2024-09-19 00:39:49 
2024-09-19 00:39:49 Segmentation fault

像是这个包编译就有问题 ... ...

当然,也有可能这部分日志不是这个单测的 ~

EDIT test_imperative_qat_fuse 继承的单测跟 adam 有关 ... ...

  1. 是否可以用aistudio的环境编译一下测测看?

aistudio 的编译环境貌似要申请 ~ 可以申请吗?

megemini avatar Sep 19 '24 12:09 megemini

是否可以从 PR-CI-Kunlun-R200 出现的问题入手,xpu 底层不支持 amsgrad,所以我只修改了函数的参数列表,但是也出现了段错误

2024-09-19 14:33:25 1/1 Test #2332: test_merged_adam_op_xpu ..........***Failed    3.69 sec
2024-09-19 14:33:25 XPURT /paddle/build/python/paddle/base/../libs/libxpurt.so.1 loaded
2024-09-19 14:33:25 XCCL /paddle/build/python/paddle/base/../libs/libbkcl.so loaded
2024-09-19 14:33:25 [14:33:22][bddwd-isa-ai-chip5.bddwd.baidu.c][22575][WARN][BKCL][globals.cpp:177] xccl version: 51f983b [rdma] [BKCL-420 tracehang optimization] build data: Sep  2 2024 06:43:09
2024-09-19 14:33:25 Hint: Your machine support AVX, but the installed paddlepaddle doesn't have avx core. Hence, no-avx core with worse performance will be imported.
2024-09-19 14:33:25 If you like, you could reinstall paddlepaddle by 'python -m pip install --force-reinstall paddlepaddle-gpu[==version]' to get better performance.
2024-09-19 14:33:25 W0919 14:33:22.960089 22575 xpu_context.cc:176] Please NOTE: xpu device: 0
2024-09-19 14:33:25 --------------------------------------
2024-09-19 14:33:25 C++ Traceback (most recent call last):
2024-09-19 14:33:25 --------------------------------------
2024-09-19 14:33:25 No stack trace in paddle, may be caused by external reasons.
2024-09-19 14:33:25 ----------------------
2024-09-19 14:33:25 Error Message Summary:
2024-09-19 14:33:25 ----------------------
2024-09-19 14:33:25 FatalError: `Segmentation fault` is detected by the operating system.
2024-09-19 14:33:25   [TimeInfo: *** Aborted at 1726727602 (unix time) try "date -d @1726727602" if you are using GNU date ***]
2024-09-19 14:33:25   [SignalInfo: *** SIGSEGV (@0x0) received by PID 22575 (TID 0x7efdbbc71740) from PID 0 ***]
2024-09-19 14:33:25 Segmentation fault
2024-09-19 14:33:25 0% tests passed, 1 tests failed out of 1

merged_adam 报的错误,这个算子跟 adam 是在一起的 ... ...

megemini avatar Sep 19 '24 12:09 megemini

个别的CI可能是机器问题,但有的我看应该是某些单测没有对应修改,导致报错的

嗯,重点看了一下 PR-CI-Py3 ,里面大部分的单测我这边是没问题的,对应 PR-CI-Windows 里面的单测也是 PASS 状态 ~

个别的会报错

E       AssertionError: In PaddlePaddle 2.x, we turn on dynamic graph mode by default, and 'data()' is only supported in static graph mode. So if you want to use this api, please call 'paddle.enable_static()' before this api to enter static graph mode.

跟这个 PR 没啥关系,我把对应文件的 paddle.enable_static() 先加上 ~

test.cc 里面报错的问题我再定位一下 ~

这个报错应该是因为很多单测没有使用dygraph/static_guard,而是手动在单侧开头位置开启动态图/静态图模式进行测试,导致某个单测挂了之后,没有正常回退动态/静态图状态,影响接下来的所有单测,最终导致很多单测挂了并且报上面这个错误。

因此一个比较好的办法是找到CI日志里第一个报错的单测,因为他不会被其他单测影响,然后解决了再重新测一下看看,以此类推

HydrogenSulfate avatar Sep 19 '24 14:09 HydrogenSulfate

举个例子,你第一个挂的单测是这个: image image image 可以看到JITKernel.Adam和JITKernel.AdamW都挂了,优化过程中的参数这应该是跟你的PR修改有关系,可以把test.cc里的其他单测注了,复现出挂掉的场景。然后首先在test.cc的每个EXPECT_EQ上下加一些printf,确认下是哪一行报的错,然后再到refer.h里加一些printf语句,看下具体为什么计算结果会发生改变 image

HydrogenSulfate avatar Sep 19 '24 14:09 HydrogenSulfate

举个例子,你第一个挂的单测是这个

👍️👍️👍️ 赞 ~

现在问题是,我这边复现不出错误 🤣 今天在 docker 下面重新编译测试,也没啥问题 ... ... 要不申请个 aistudio 的编译环境试试?

这是之前的测试测试结果


➜  build git:(hack7_amsgrad) ctest -R jit_kernel_test -V
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
Test project /paddle/Paddle/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 8
    Start 8: jit_kernel_test

8: Test command: /paddle/Paddle/build/test/cpp/jit_kernel_test
8: Environment variables: 
8:  FLAGS_init_allocated_mem=true
8:  FLAGS_cudnn_deterministic=true
8:  LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/paddle/Paddle/build/python/paddle/libs:/paddle/Paddle/build/python/paddle/base
8: Test timeout computed to be: 10000000
8: [==========] Running 45 tests from 4 test cases.
8: [----------] Global test environment set-up.
8: [----------] 4 tests from JITKernel_pool
8: [ RUN      ] JITKernel_pool.jitcreator
8: [       OK ] JITKernel_pool.jitcreator (0 ms)
8: [ RUN      ] JITKernel_pool.jitpool
8: [       OK ] JITKernel_pool.jitpool (0 ms)
8: [ RUN      ] JITKernel_pool.more
8: [       OK ] JITKernel_pool.more (0 ms)
8: [ RUN      ] JITKernel_pool.refer
8: [       OK ] JITKernel_pool.refer (0 ms)
8: [----------] 4 tests from JITKernel_pool (0 ms total)
8: 
8: [----------] 6 tests from JITKernel_helper
8: [ RUN      ] JITKernel_helper.GetAllCandidateKernels
8: [       OK ] JITKernel_helper.GetAllCandidateKernels (0 ms)
8: [ RUN      ] JITKernel_helper.GetAllCandidateFuncsWithTypes
8: [       OK ] JITKernel_helper.GetAllCandidateFuncsWithTypes (0 ms)
8: [ RUN      ] JITKernel_helper.KernelFuncs
8: [       OK ] JITKernel_helper.KernelFuncs (0 ms)
8: [ RUN      ] JITKernel_helper.GetAllCandidateFuncs
8: [       OK ] JITKernel_helper.GetAllCandidateFuncs (4 ms)
8: [ RUN      ] JITKernel_helper.pack_weights
8: [       OK ] JITKernel_helper.pack_weights (0 ms)
8: [ RUN      ] JITKernel_helper.attr
8: [       OK ] JITKernel_helper.attr (0 ms)
8: [----------] 6 tests from JITKernel_helper (4 ms total)
8: 
8: [----------] 8 tests from JITKernel_key
8: [ RUN      ] JITKernel_key.int
8: [       OK ] JITKernel_key.int (0 ms)
8: [ RUN      ] JITKernel_key.gru
8: [       OK ] JITKernel_key.gru (0 ms)
8: [ RUN      ] JITKernel_key.lstm
8: [       OK ] JITKernel_key.lstm (0 ms)
8: [ RUN      ] JITKernel_key.seq_pool
8: [       OK ] JITKernel_key.seq_pool (0 ms)
8: [ RUN      ] JITKernel_key.matmul
8: [       OK ] JITKernel_key.matmul (0 ms)
8: [ RUN      ] JITKernel_key.emb_seq_pool
8: [       OK ] JITKernel_key.emb_seq_pool (0 ms)
8: [ RUN      ] JITKernel_key.adam
8: [       OK ] JITKernel_key.adam (0 ms)
8: [ RUN      ] JITKernel_key.sgd
8: [       OK ] JITKernel_key.sgd (0 ms)
8: [----------] 8 tests from JITKernel_key (0 ms total)
8: 
8: [----------] 27 tests from JITKernel
8: [ RUN      ] JITKernel.VMul
8: [       OK ] JITKernel.VMul (3 ms)
8: [ RUN      ] JITKernel.VAdd
8: [       OK ] JITKernel.VAdd (2 ms)
8: [ RUN      ] JITKernel.VAddRelu
8: [       OK ] JITKernel.VAddRelu (1 ms)
8: [ RUN      ] JITKernel.VSub
8: [       OK ] JITKernel.VSub (1 ms)
8: [ RUN      ] JITKernel.VScal
8: [       OK ] JITKernel.VScal (1 ms)
8: [ RUN      ] JITKernel.VAddBias
8: [       OK ] JITKernel.VAddBias (0 ms)
8: [ RUN      ] JITKernel.VRelu
8: [       OK ] JITKernel.VRelu (2 ms)
8: [ RUN      ] JITKernel.VIdentity
8: [       OK ] JITKernel.VIdentity (1 ms)
8: [ RUN      ] JITKernel.VSquare
8: [       OK ] JITKernel.VSquare (1 ms)
8: [ RUN      ] JITKernel.VExp
8: [       OK ] JITKernel.VExp (1 ms)
8: [ RUN      ] JITKernel.VSigmoid
8: [       OK ] JITKernel.VSigmoid (2 ms)
8: [ RUN      ] JITKernel.VTanh
8: [       OK ] JITKernel.VTanh (3 ms)
8: [ RUN      ] JITKernel.VCopy
8: [       OK ] JITKernel.VCopy (1 ms)
8: [ RUN      ] JITKernel.LSTMCtHt
8: [       OK ] JITKernel.LSTMCtHt (301 ms)
8: [ RUN      ] JITKernel.LSTMC1H1
8: [       OK ] JITKernel.LSTMC1H1 (259 ms)
8: [ RUN      ] JITKernel.GRUH1
8: [       OK ] JITKernel.GRUH1 (17 ms)
8: [ RUN      ] JITKernel.GRUHtPart1
8: [       OK ] JITKernel.GRUHtPart1 (14 ms)
8: [ RUN      ] JITKernel.GRUHtPart2
8: [       OK ] JITKernel.GRUHtPart2 (17 ms)
8: [ RUN      ] JITKernel.LayerNorm
8: [       OK ] JITKernel.LayerNorm (193 ms)
8: [ RUN      ] JITKernel.CRFDecoding
8: [       OK ] JITKernel.CRFDecoding (436 ms)
8: [ RUN      ] JITKernel.SeqPool
8: [       OK ] JITKernel.SeqPool (390 ms)
8: [ RUN      ] JITKernel.EmbSeqPool
8: [       OK ] JITKernel.EmbSeqPool (382 ms)
8: [ RUN      ] JITKernel.MatMul
8: [       OK ] JITKernel.MatMul (9 ms)
8: [ RUN      ] JITKernel.Adam
8: [       OK ] JITKernel.Adam (0 ms)
8: [ RUN      ] JITKernel.AdamW
8: [       OK ] JITKernel.AdamW (0 ms)
8: [ RUN      ] JITKernel.Sgd
8: [       OK ] JITKernel.Sgd (56 ms)
8: [ RUN      ] JITKernel.VBroadcast
8: [       OK ] JITKernel.VBroadcast (5 ms)
8: [----------] 27 tests from JITKernel (2098 ms total)
8: 
8: [----------] Global test environment tear-down
8: [==========] 45 tests from 4 test cases ran. (2102 ms total)
8: [  PASSED  ] 45 tests.
1/1 Test #8: jit_kernel_test ..................   Passed    2.24 sec

The following tests passed:
	jit_kernel_test

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   2.35 sec


用的 docker 是这个

image

megemini avatar Sep 19 '24 14:09 megemini

举个例子,你第一个挂的单测是这个

👍️👍️👍️ 赞 ~

现在问题是,我这边复现不出错误 🤣 今天在 docker 下面重新编译测试,也没啥问题 ... ... 要不申请个 aistudio 的编译环境试试?

这是之前的测试测试结果

➜  build git:(hack7_amsgrad) ctest -R jit_kernel_test -V
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/paddle/Paddle/build/DartConfiguration.tcl
Test project /paddle/Paddle/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 8
    Start 8: jit_kernel_test

8: Test command: /paddle/Paddle/build/test/cpp/jit_kernel_test
8: Environment variables: 
8:  FLAGS_init_allocated_mem=true
8:  FLAGS_cudnn_deterministic=true
8:  LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/paddle/Paddle/build/python/paddle/libs:/paddle/Paddle/build/python/paddle/base
8: Test timeout computed to be: 10000000
8: [==========] Running 45 tests from 4 test cases.
8: [----------] Global test environment set-up.
8: [----------] 4 tests from JITKernel_pool
8: [ RUN      ] JITKernel_pool.jitcreator
8: [       OK ] JITKernel_pool.jitcreator (0 ms)
8: [ RUN      ] JITKernel_pool.jitpool
8: [       OK ] JITKernel_pool.jitpool (0 ms)
8: [ RUN      ] JITKernel_pool.more
8: [       OK ] JITKernel_pool.more (0 ms)
8: [ RUN      ] JITKernel_pool.refer
8: [       OK ] JITKernel_pool.refer (0 ms)
8: [----------] 4 tests from JITKernel_pool (0 ms total)
8: 
8: [----------] 6 tests from JITKernel_helper
8: [ RUN      ] JITKernel_helper.GetAllCandidateKernels
8: [       OK ] JITKernel_helper.GetAllCandidateKernels (0 ms)
8: [ RUN      ] JITKernel_helper.GetAllCandidateFuncsWithTypes
8: [       OK ] JITKernel_helper.GetAllCandidateFuncsWithTypes (0 ms)
8: [ RUN      ] JITKernel_helper.KernelFuncs
8: [       OK ] JITKernel_helper.KernelFuncs (0 ms)
8: [ RUN      ] JITKernel_helper.GetAllCandidateFuncs
8: [       OK ] JITKernel_helper.GetAllCandidateFuncs (4 ms)
8: [ RUN      ] JITKernel_helper.pack_weights
8: [       OK ] JITKernel_helper.pack_weights (0 ms)
8: [ RUN      ] JITKernel_helper.attr
8: [       OK ] JITKernel_helper.attr (0 ms)
8: [----------] 6 tests from JITKernel_helper (4 ms total)
8: 
8: [----------] 8 tests from JITKernel_key
8: [ RUN      ] JITKernel_key.int
8: [       OK ] JITKernel_key.int (0 ms)
8: [ RUN      ] JITKernel_key.gru
8: [       OK ] JITKernel_key.gru (0 ms)
8: [ RUN      ] JITKernel_key.lstm
8: [       OK ] JITKernel_key.lstm (0 ms)
8: [ RUN      ] JITKernel_key.seq_pool
8: [       OK ] JITKernel_key.seq_pool (0 ms)
8: [ RUN      ] JITKernel_key.matmul
8: [       OK ] JITKernel_key.matmul (0 ms)
8: [ RUN      ] JITKernel_key.emb_seq_pool
8: [       OK ] JITKernel_key.emb_seq_pool (0 ms)
8: [ RUN      ] JITKernel_key.adam
8: [       OK ] JITKernel_key.adam (0 ms)
8: [ RUN      ] JITKernel_key.sgd
8: [       OK ] JITKernel_key.sgd (0 ms)
8: [----------] 8 tests from JITKernel_key (0 ms total)
8: 
8: [----------] 27 tests from JITKernel
8: [ RUN      ] JITKernel.VMul
8: [       OK ] JITKernel.VMul (3 ms)
8: [ RUN      ] JITKernel.VAdd
8: [       OK ] JITKernel.VAdd (2 ms)
8: [ RUN      ] JITKernel.VAddRelu
8: [       OK ] JITKernel.VAddRelu (1 ms)
8: [ RUN      ] JITKernel.VSub
8: [       OK ] JITKernel.VSub (1 ms)
8: [ RUN      ] JITKernel.VScal
8: [       OK ] JITKernel.VScal (1 ms)
8: [ RUN      ] JITKernel.VAddBias
8: [       OK ] JITKernel.VAddBias (0 ms)
8: [ RUN      ] JITKernel.VRelu
8: [       OK ] JITKernel.VRelu (2 ms)
8: [ RUN      ] JITKernel.VIdentity
8: [       OK ] JITKernel.VIdentity (1 ms)
8: [ RUN      ] JITKernel.VSquare
8: [       OK ] JITKernel.VSquare (1 ms)
8: [ RUN      ] JITKernel.VExp
8: [       OK ] JITKernel.VExp (1 ms)
8: [ RUN      ] JITKernel.VSigmoid
8: [       OK ] JITKernel.VSigmoid (2 ms)
8: [ RUN      ] JITKernel.VTanh
8: [       OK ] JITKernel.VTanh (3 ms)
8: [ RUN      ] JITKernel.VCopy
8: [       OK ] JITKernel.VCopy (1 ms)
8: [ RUN      ] JITKernel.LSTMCtHt
8: [       OK ] JITKernel.LSTMCtHt (301 ms)
8: [ RUN      ] JITKernel.LSTMC1H1
8: [       OK ] JITKernel.LSTMC1H1 (259 ms)
8: [ RUN      ] JITKernel.GRUH1
8: [       OK ] JITKernel.GRUH1 (17 ms)
8: [ RUN      ] JITKernel.GRUHtPart1
8: [       OK ] JITKernel.GRUHtPart1 (14 ms)
8: [ RUN      ] JITKernel.GRUHtPart2
8: [       OK ] JITKernel.GRUHtPart2 (17 ms)
8: [ RUN      ] JITKernel.LayerNorm
8: [       OK ] JITKernel.LayerNorm (193 ms)
8: [ RUN      ] JITKernel.CRFDecoding
8: [       OK ] JITKernel.CRFDecoding (436 ms)
8: [ RUN      ] JITKernel.SeqPool
8: [       OK ] JITKernel.SeqPool (390 ms)
8: [ RUN      ] JITKernel.EmbSeqPool
8: [       OK ] JITKernel.EmbSeqPool (382 ms)
8: [ RUN      ] JITKernel.MatMul
8: [       OK ] JITKernel.MatMul (9 ms)
8: [ RUN      ] JITKernel.Adam
8: [       OK ] JITKernel.Adam (0 ms)
8: [ RUN      ] JITKernel.AdamW
8: [       OK ] JITKernel.AdamW (0 ms)
8: [ RUN      ] JITKernel.Sgd
8: [       OK ] JITKernel.Sgd (56 ms)
8: [ RUN      ] JITKernel.VBroadcast
8: [       OK ] JITKernel.VBroadcast (5 ms)
8: [----------] 27 tests from JITKernel (2098 ms total)
8: 
8: [----------] Global test environment tear-down
8: [==========] 45 tests from 4 test cases ran. (2102 ms total)
8: [  PASSED  ] 45 tests.
1/1 Test #8: jit_kernel_test ..................   Passed    2.24 sec

The following tests passed:
	jit_kernel_test

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   2.35 sec

本地运行的paddle commit确认是与本PR的commit对应的吗?之前遇到过本地修改过能过,但是忘记commit了导致PR过不了CI

HydrogenSulfate avatar Sep 19 '24 14:09 HydrogenSulfate

另外你使用的机器是win还是mac还是linux呢?如果没有linux机器,我明天给你申请一个aistudio

HydrogenSulfate avatar Sep 19 '24 14:09 HydrogenSulfate