PET about the NWPU dataset

Thanks for your excellent work! How did you handle with the image contains no person? I did not find the relative code in both SHA.py and preprocess_label.py . I seen in other issue you mentioned that this situation may have negative effect in the final result, so how do you deal with this kind of input training images? Directly delete them from the training set?

Apr 07 '25 04:04 BurstLink666

Currently I directly set the computed density of GT as the clone of the input for compute_density() method, and set the distance to 999 just as other situation that you set, and the training result is very bad. I got mae=406.

Apr 07 '25 04:04 BurstLink666

Regarding images with no person, there are three ways to process them during training:

If the number of empty images is relatively small, you may treat these images as normal images and crop patches to train the model. In this case, patches with no person function as negative samples, which is helpful to alleviate over-estimation.
If the number of empty images is quite large, you may adopt a sampling strategy during training, e.g., sampling empty images with a certain probability.
If you do not care about over-estimation, you may simply ignore empty images.

For the NWPU-Crowd dataset, the second way could be a good choice, given that there are many empty images in this dataset.

Apr 07 '25 09:04 cxliu0

Did you correctly process the annotations? An MAE of 406 is abnormal. Additionally, it would be helpful if you could share your training setting.

Apr 07 '25 09:04 cxliu0

Here is the training setting I used:

CUDA_VISIBLE_DEVICES='0'
python -m torch.distributed.launch
--nproc_per_node=1
--master_port=10001
--use_env main.py
--lr=0.00001
--backbone="vgg16_bn"
--ce_loss_coef=1.0
--point_loss_coef=5.0
--eos_coef=0.5
--dec_layers=2
--hidden_dim=256
--dim_feedforward=512
--nheads=8
--dropout=0.0
--epochs=1500
--dataset_file="NWPU"
--eval_freq=5
--batch_size=8
--output_dir='pet_model'

Apr 08 '25 02:04 BurstLink666

I processed the dataset using the code you provided in preprocess_dataset.py, and customize the SHA.py to adjust NWPU dataset. Specially in compute_density() method, I added some code to deal with empty images, as follow:

class NWPU(Dataset): def init(self, data_root, transform=None, train=False, flip=False): self.root_path = data_root

    prefix = "train" if train else "val"
    file_list = f"{data_root}/{prefix}.txt"
    with open(file_list, "r") as f:
        f_list = f.readlines()
    name_list = []
    for i in range(len(f_list)):
        fname = f_list[i].split(' ')[0] + '.jpg'
        name_list.append(fname)

    self.prefix = prefix
    self.img_list = os.listdir(f"{data_root}/images/")

    # get image and ground-truth list
    self.gt_list = {}
    for img_name in self.img_list:
        if img_name not in name_list:
            continue
        img_path = f"{data_root}/images/{img_name}"  
        gt_path = f"{data_root}/jsons/{img_name}"
        self.gt_list[img_path] = gt_path.replace("jpg", "json")
    self.img_list = sorted(list(self.gt_list.keys()))
    self.nSamples = len(self.img_list)

    self.transform = transform
    self.train = train
    self.flip = flip
    self.patch_size = 256

def compute_density(self, points):
    """
    Compute crowd density:
        - defined as the average nearest distance between ground-truth points
    """
    points_tensor = torch.from_numpy(points.copy())

    if points_tensor.shape[0] == 0:
        density = torch.tensor(999.0).reshape(-1)
        return density

    dist = torch.cdist(points_tensor, points_tensor, p=2)
    if points_tensor.shape[0] > 1:
        density = dist.sort(dim=1)[0][:,1].mean().reshape(-1)
    else:
        density = torch.tensor(999.0).reshape(-1)
    return density

def __len__(self):
    return self.nSamples

def __getitem__(self, index):
    assert index <= len(self), 'index range error'

    # load image and gt points
    img_path = self.img_list[index]
    gt_path = self.gt_list[img_path]
    img, points = load_data((img_path, gt_path), self.train)
    points = np.array(points).astype(float)

    # image transform
    if self.transform is not None:
        img = self.transform(img)
    img = torch.Tensor(img)

    # random scale
    if self.train:
        scale_range = [0.8, 1.2]           
        min_size = min(img.shape[1:])
        scale = random.uniform(*scale_range)
        
        # interpolation
        if scale * min_size > self.patch_size:  
            img = torch.nn.functional.upsample_bilinear(img.unsqueeze(0), scale_factor=scale).squeeze(0)
            points *= scale

    # random crop patch
    if self.train:
        img, points = random_crop(img, points, patch_size=self.patch_size)

    # random flip
    if random.random() > 0.5 and self.train and self.flip:
        img = torch.flip(img, dims=[2])
        if len(points) != 0:
            points[:, 1] = self.patch_size - points[:, 1]

    # target
    target = {}
    target['points'] = torch.Tensor(points)
    target['labels'] = torch.ones([points.shape[0]]).long()

    if self.train:
        density = self.compute_density(points)
        target['density'] = density

    if not self.train:
        target['image_path'] = img_path

    return img, target

Apr 08 '25 02:04 BurstLink666

Also, I found that sometime the self attention layer may output the results that contain nan.

Apr 08 '25 06:04 BurstLink666

The training setting seems fine. Could you confirm that the format of point annotations is (y, x) instead of (x, y)? Wrong annotation format will lead to erroneous model outputs.

Apr 09 '25 04:04 cxliu0

I directly load the annotations from original .json files of the official NWPU dataset without any changes

Apr 09 '25 05:04 BurstLink666

What about load_data function in class NWPU? There is a flip operation to ensure that the data format is (y, x). Perhaps you did not follow this format, which leads to abnormal model outputs.

You can visualize the outputs of your trained model and check whether the predictions are reasonable.

Apr 09 '25 08:04 cxliu0

That hits. Now the mse reached 88.702 at epoch 224. Thank you very much.

Apr 11 '25 06:04 BurstLink666

I am glad to see that the issue has been resolved.

Apr 11 '25 08:04 cxliu0

May I ask in which epoch you got the best mae for NWPU dataset?

Apr 11 '25 08:04 BurstLink666

I did not recall the precise epoch with the best MAE, but the model should be okay for testing if the validation MAE is around 50.

Apr 12 '25 08:04 cxliu0

I got mse around 80, it seems there still have some problems. Maybe something wrong with the data loading process. It would be helpful if you can provide your NWPU class.

Apr 14 '25 07:04 BurstLink666

请问您在Shanghai_A数据集上的结果是多少呢

Apr 14 '25 13:04 lp-094

mae = 49.901 for part A and mae = 6.639 for part B

Apr 15 '25 05:04 BurstLink666

请问你调整参数了吗？我最近跑出的实验结果一直在52和53多，也不清楚是为什么

Apr 15 '25 06:04 lp-094

I use the same training settings as the author's. Maybe you can increase the total training epochs from 1500 to 3000.

Apr 16 '25 05:04 BurstLink666

谢谢，再请教一下，您跑的时候用的python版本和pytorch版本以及显卡是什么呢

------------------ 原始邮件 ------------------ 发件人: "R. @.>; 发送时间: 2025年4月16日(星期三) 中午1:30 收件人: @.>; 抄送: "happy @.>; @.>; 主题: Re: [cxliu0/PET] about the NWPU dataset (Issue #33)

I use the same training settings as the author's. Maybe you can increase the total training epochs from 1500 to 3000.

请问你调整参数了吗？我最近跑出的实验结果一直在52和53多，也不清楚是为什么

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***> BurstLink666 left a comment (cxliu0/PET#33)

I use the same training settings as the author's. Maybe you can increase the total training epochs from 1500 to 3000.

请问你调整参数了吗？我最近跑出的实验结果一直在52和53多，也不清楚是为什么

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Apr 16 '25 06:04 lp-094

one 3090 GPU, python 3.9.5 with torch 2.6.0+cu124

Apr 16 '25 08:04 BurstLink666