MotionGPT icon indicating copy to clipboard operation
MotionGPT copied to clipboard

Questions about testing results

Open weleen opened this issue 2 years ago • 8 comments

Thank you for your great job! I have tried to reproduce the results and encountered some issues.

Following instructions, I evaluate the provided checkpoint downloaded from huggingface.

I run the following commands:

python -m test --cfg configs/config_h3d_stage3.yaml --task t2m
python -m test --cfg configs/config_h3d_stage3.yaml --task m2t

The evaluation results are not consistent with the results reported in the paper. The attachments are the log and metrics.

t2m results: image log_2023-10-04-19-56-23_test.log image

Would you happen to have any idea about what's wrong with the configuration?

weleen avatar Oct 04 '23 13:10 weleen

About m2t task, the testing process is stuck at the 4th replication since the SIGTERM signal. image

Similar to t2m, the testing result is behind the results reported in the paper. Especially Bleu@4 and CIDEr, only around 6 and 7. image

I would appreciate it if you have time to help fix my issue.😄

weleen avatar Oct 05 '23 01:10 weleen

@weleen hi! Has this issue been resolved? We met the same issue.

LinghaoChan avatar Oct 27 '23 03:10 LinghaoChan

@weleen hi! Has this issue been resolved? We met the same issue.

@LinghaoChan I think there are some mistakes in get_motion_embeddings.

In m2t.py https://github.com/OpenMotionLab/MotionGPT/blob/0499f16df4ddde44dfd72a7cbd7bd615af1b1a94/mGPT/metrics/m2t.py#L325-L329

In t2m.py https://github.com/OpenMotionLab/MotionGPT/blob/0499f16df4ddde44dfd72a7cbd7bd615af1b1a94/mGPT/metrics/t2m.py#L251-L254

m_lens are divided two times.

However, even I fix these errors, the results are still different. Have you solved this issus?

weleen avatar Nov 11 '23 13:11 weleen

same issue

Spark001 avatar Dec 14 '23 03:12 Spark001

@weleen hi! Has this issue been resolved? We met the same issue.

hi, me too.

GuangtaoLyu avatar Jul 02 '24 13:07 GuangtaoLyu

me too. Furthermore, I cant reproduce the real score of m2t on this MotionGPT paper and MotionGPT-2 paper. In these papers, R-Precision, MM Dist. are following:

Image

However, when I run test.py in this repository, the R-precision and MM dist deviate by 0.2 points and 0.07 points, respectively. Image

I consider this to be a very problematic deviation.

shin-wn avatar Feb 19 '25 17:02 shin-wn

@weleen hi! Has this issue been resolved? We met the same issue.

@LinghaoChan I think there are some mistakes in get_motion_embeddings.

In m2t.py

MotionGPT/mGPT/metrics/m2t.py

Lines 325 to 329 in 0499f16

m_lens = torch.div(m_lens, self.cfg.DATASET.HUMANML3D.UNIT_LEN, rounding_mode="floor") ref_mov = self.t2m_moveencoder(feats_ref[..., :-4]).detach() m_lens = m_lens // self.unit_length In t2m.py

MotionGPT/mGPT/metrics/t2m.py

Lines 251 to 254 in 0499f16

m_lens = torch.div(m_lens, self.cfg.DATASET.HUMANML3D.UNIT_LEN, rounding_mode="floor") m_lens = m_lens // self.cfg.DATASET.HUMANML3D.UNIT_LEN m_lens are divided two times.

However, even I fix these errors, the results are still different. Have you solved this issus?

It looks like the variable 'm_lens' is not actually used:

def forward(self, inputs, m_lens):

        num_samples = inputs.shape[0]

        input_embs = self.input_emb(inputs)
        hidden = self.hidden.repeat(1, num_samples, 1)

        cap_lens = m_lens.data.tolist()
        
        # emb = pack_padded_sequence(input=input_embs, lengths=cap_lens, batch_first=True)
        emb = input_embs

        gru_seq, gru_last = self.gru(emb, hidden)

        gru_last = torch.cat([gru_last[0], gru_last[1]], dim=-1)

        return self.output_net(gru_last)

feifeifeiliu avatar Mar 19 '25 04:03 feifeifeiliu

I am also facing the same issue.

Until now, I have tried the following modifications:

  • I commented out the redundant division mentioned by @weleen ;
  • I tried to use emb = pack_padded_sequence(input=input_embs, lengths=cap_lens, batch_first=True) to use m_lens, according to the comments by @feifeifeiliu .

By doing so, the T2M results of the provided checkpoint (from huggingface) are as follows:

Image

Compared with results posted by @weleen, after my modification Matching_score and R_precision get improved, but results on other metrics get even worse.

Most importantly, after doing this, there is still a clear gap between these results and those posted in the paper, even if comparing with MotionGPT(Pre-trained).

Really hope the authors or someone who has successfully reproduced this work could provide some hints on this issue🙏

Lyman-Smoker avatar Mar 20 '25 06:03 Lyman-Smoker