InterGen icon indicating copy to clipboard operation
InterGen copied to clipboard

Missing part and confusing place

Open massyzs opened this issue 1 year ago • 7 comments

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

massyzs avatar Dec 02 '24 07:12 massyzs

After working on this project since June 2024, I feel like they have manipulated the InterCLIP checkpoints. I cannot achieve the same evaluation results as those reported in the paper.

There is something you can check in your dataset. You will find a lot of duplicates, which they CLAIM:

"Although the textual annotations are similar (since the semantic category of these motions is 'dance'), each motion captured is unique. This does not limit but rather enhances diversity. For example, the diffusion model inherently has the capability to model such diversity effectively. Hence, similar annotations in this context are not a problem but an opportunity to refine the model's ability to generate nuanced variations of similar actions."
[Source](https://github.com/tr3e/InterGen/issues/45)

Dancing : 50 sequences

5088 - two people are dancing together.
5320 - two people practice dancing together.
5326 - two individuals are dancing together.
5382 - two individuals are dancing together.
5416 - two people are dancing together.
5504 - two people are dancing together.
5559 - two people are dancing together.
5587 - two people are dancing together.
5716 - the two persons are dancing together.
5782 - two people are dancing together.
5825 - two individuals are dancing together.
5917 - two people are breakdancing.
5926 - two people are dancing together.
5941 - two individuals are dancing together gracefully.
5997 - two individuals are dancing together.
6011 - two persons are dancing.
6035 - the two individuals are dancing together.
6043 - two people are dancing in pairs.
6077 - two individuals are dancing separately.
6096 - two people are dancing together.
6145 - two individuals are dancing together.
6159 - two individuals are dancing together.
6232 - two people are dancing together.
6237 - two individuals are dancing together.
6247 - two people are dancing together.
6286 - the two individuals are dancing together.
6299 - two persons are dancing together.
6311 - the two people are dancing together.
6401 - two people are dancing together.
6409 - they are dancing together.
6420 - the two persons are dancing together.
6436 - two people are dancing together.
6466 - two people are dancing together.
6478 - two individuals are dancing together.
6495 - two people are dancing together.
6506 - two persons are dancing together.
6533 - the two individuals are dancing together.
6544 - two people are dancing together.
6568 - the two individuals are dancing together.
6587 - the two individuals are dancing together.
6596 - two individuals are dancing together.
6619 - two individuals are dancing together.
6629 - the two persons are dancing.
6671 - two people are dancing gracefully.
6739 - two persons are dancing together.
6744 - two persons are dancing together.
6867 - the two people are dancing together.
6870 - two people are dancing together.
6877 - two people are dancing.
6939 - two people are dancing a ballroom dance together.
6944 - the two individuals are dancing together.

taichi : 17 sequences

2851 - two individuals are practicing tai chi together.
2855 - two individuals are practicing tai chi.
2863 - two people are practicing tai chi.
2867 - two individuals are practicing tai chi.
2913 - two individuals are practicing tai chi.
2918 - two people practicing tai chi.
2922 - two people are practicing tai chi.
2929 - two people are practicing tai chi.
2956 - two persons are practicing tai chi.
2963 - two people are practicing tai chi.
2967 - two individuals are practicing tai chi.
2986 - two individuals are practicing tai chi.
3683 - two people are practicing tai chi together.
3771 - two people are practicing tai chi.
4479 - two people are practicing tai chi.
4952 - two individuals are practicing tai chi.
7059 - two people practicing tai chi together.

sparring : 28 sequences

562 - two people are sparring in taekwondo, exchanging kicks with one another.
635 - the two are sparring in taekwondo.
1399 - the two are sparring in taekwondo, exchanging kicks and strikes.
1716 - two performers are sparring in the ring, throwing punches at one another.
3017 - two persons are sparring using fists.
3030 - two individuals are sparring with each other.
3055 - two persons are sparring with each other.
3057 - two individuals are sparring with each other.
3059 - two individuals are sparring with each other.
3137 - the two people are sparring with martial arts techniques.
3246 - two individuals are sparring with each other.
3249 - two individuals are sparring against each other.
3253 - two individuals are sparring with each other.
3256 - two individuals are sparring with each other.
3258 - two individuals are sparring with each other.
3260 - two individuals are sparring with each other.
3591 - two individuals are sparring with each other.
3593 - two people are sparring against each other.
3595 - two persons are sparring with each other.
3597 - the two people are sparring in martial arts.
3673 - two people are sparring with each other.
3675 - two individuals are sparring with each other.
3677 - two individuals are sparring each other.
3679 - two people are sparring against each other.
3681 - two people are sparring against each other.
3855 - two individuals are sparring with each other.
3857 - the two people are sparring in martial arts.
3859 - two individuals are sparring with each other.

rock-paper-scissors : 4 sequences

2753 - two individuals are playing a game of rock-paper-scissors.
2756 - two individuals are playing a game of rock-paper-scissors.
2759 - two people are playing a game of rock-paper-scissors.
3381 - the two people are playing rock-paper-scissors.
  • Some sequences are entirely blank(7 sequences), such as the following examples:
    2258 - no modification made.
    4193 - transition 
    4385 - transition  
    4434 - transition  
    6028 - transition  
    6940 - transition  
    7220 - pass  
    7221 - pass  
    

I trained same as epoch they mentions, but the result is totally different.

Dear Author, can you clarify these duplications again? How you have 10 actions in one similar sentence. If this happen for real then it means that the model can generate the 10 actions of dancing, right? then how could we proof accurate like you did.

bring-nirachornkul avatar Dec 02 '24 07:12 bring-nirachornkul

After working on this project since June 2024, I feel like they have manipulated the InterCLIP checkpoints. I cannot achieve the same evaluation results as those reported in the paper.

There is something you can check in your dataset. You will find a lot of duplicates, which they CLAIM:

"Although the textual annotations are similar (since the semantic category of these motions is 'dance'), each motion captured is unique. This does not limit but rather enhances diversity. For example, the diffusion model inherently has the capability to model such diversity effectively. Hence, similar annotations in this context are not a problem but an opportunity to refine the model's ability to generate nuanced variations of similar actions." [Source](#45)

Dancing : 50 sequences

5088 - two people are dancing together.
5320 - two people practice dancing together.
5326 - two individuals are dancing together.
5382 - two individuals are dancing together.
5416 - two people are dancing together.
5504 - two people are dancing together.
5559 - two people are dancing together.
5587 - two people are dancing together.
5716 - the two persons are dancing together.
5782 - two people are dancing together.
5825 - two individuals are dancing together.
5917 - two people are breakdancing.
5926 - two people are dancing together.
5941 - two individuals are dancing together gracefully.
5997 - two individuals are dancing together.
6011 - two persons are dancing.
6035 - the two individuals are dancing together.
6043 - two people are dancing in pairs.
6077 - two individuals are dancing separately.
6096 - two people are dancing together.
6145 - two individuals are dancing together.
6159 - two individuals are dancing together.
6232 - two people are dancing together.
6237 - two individuals are dancing together.
6247 - two people are dancing together.
6286 - the two individuals are dancing together.
6299 - two persons are dancing together.
6311 - the two people are dancing together.
6401 - two people are dancing together.
6409 - they are dancing together.
6420 - the two persons are dancing together.
6436 - two people are dancing together.
6466 - two people are dancing together.
6478 - two individuals are dancing together.
6495 - two people are dancing together.
6506 - two persons are dancing together.
6533 - the two individuals are dancing together.
6544 - two people are dancing together.
6568 - the two individuals are dancing together.
6587 - the two individuals are dancing together.
6596 - two individuals are dancing together.
6619 - two individuals are dancing together.
6629 - the two persons are dancing.
6671 - two people are dancing gracefully.
6739 - two persons are dancing together.
6744 - two persons are dancing together.
6867 - the two people are dancing together.
6870 - two people are dancing together.
6877 - two people are dancing.
6939 - two people are dancing a ballroom dance together.
6944 - the two individuals are dancing together.

taichi : 17 sequences

2851 - two individuals are practicing tai chi together.
2855 - two individuals are practicing tai chi.
2863 - two people are practicing tai chi.
2867 - two individuals are practicing tai chi.
2913 - two individuals are practicing tai chi.
2918 - two people practicing tai chi.
2922 - two people are practicing tai chi.
2929 - two people are practicing tai chi.
2956 - two persons are practicing tai chi.
2963 - two people are practicing tai chi.
2967 - two individuals are practicing tai chi.
2986 - two individuals are practicing tai chi.
3683 - two people are practicing tai chi together.
3771 - two people are practicing tai chi.
4479 - two people are practicing tai chi.
4952 - two individuals are practicing tai chi.
7059 - two people practicing tai chi together.

sparring : 28 sequences

562 - two people are sparring in taekwondo, exchanging kicks with one another.
635 - the two are sparring in taekwondo.
1399 - the two are sparring in taekwondo, exchanging kicks and strikes.
1716 - two performers are sparring in the ring, throwing punches at one another.
3017 - two persons are sparring using fists.
3030 - two individuals are sparring with each other.
3055 - two persons are sparring with each other.
3057 - two individuals are sparring with each other.
3059 - two individuals are sparring with each other.
3137 - the two people are sparring with martial arts techniques.
3246 - two individuals are sparring with each other.
3249 - two individuals are sparring against each other.
3253 - two individuals are sparring with each other.
3256 - two individuals are sparring with each other.
3258 - two individuals are sparring with each other.
3260 - two individuals are sparring with each other.
3591 - two individuals are sparring with each other.
3593 - two people are sparring against each other.
3595 - two persons are sparring with each other.
3597 - the two people are sparring in martial arts.
3673 - two people are sparring with each other.
3675 - two individuals are sparring with each other.
3677 - two individuals are sparring each other.
3679 - two people are sparring against each other.
3681 - two people are sparring against each other.
3855 - two individuals are sparring with each other.
3857 - the two people are sparring in martial arts.
3859 - two individuals are sparring with each other.

rock-paper-scissors : 4 sequences

2753 - two individuals are playing a game of rock-paper-scissors.
2756 - two individuals are playing a game of rock-paper-scissors.
2759 - two people are playing a game of rock-paper-scissors.
3381 - the two people are playing rock-paper-scissors.
  • Some sequences are entirely blank(7 sequences), such as the following examples:
    2258 - no modification made.
    4193 - transition 
    4385 - transition  
    4434 - transition  
    6028 - transition  
    6940 - transition  
    7220 - pass  
    7221 - pass  
    

I trained same as epoch they mentions, but the result is totally different.

Dear Author, can you clarify these duplications again? How you have 10 actions in one similar sentence. If this happen for real then it means that the model can generate the 10 actions of dancing, right? then how could we proof accurate like you did.

The rendered video in my side is totally white video. Do you know how to fix this?

massyzs avatar Dec 02 '24 07:12 massyzs

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

tr3e avatar Dec 02 '24 07:12 tr3e

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

Hi, Thanks for your reply. I go through github and found there is InterCLIP, may I ask the reason you train it again by yourself?

In addition, the rendered videos are all white, do you have any idea about this?

massyzs avatar Dec 02 '24 07:12 massyzs

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

Hi, Thanks for your reply. I go through github and found there is InterCLIP, may I ask the reason you train it again by yourself?

In addition, the rendered videos are all white, do you have any idea about this?

  1. We train the InterCLIP to extract the interaction features including not only the single-person motion features but also the spatial relations between two people. We train it ourselves since there is no existing evaluation model for two-people motions.
  2. May you kindly follow the readme step by step. It is probably because your InterGen checkpoint is not loaded correctly.

tr3e avatar Dec 02 '24 08:12 tr3e

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

Hi, Thanks for your reply. I go through github and found there is InterCLIP, may I ask the reason you train it again by yourself? In addition, the rendered videos are all white, do you have any idea about this?

  1. We train the InterCLIP to extract the interaction features including not only the single-person motion features but also the spatial relations between two people. We train it ourselves since there is no existing evaluation model for two-people motions.
  2. May you kindly follow the readme step by step. It is probably because your InterGen checkpoint is not loaded correctly.

Hi, really thanks and it works now.

One more question:

May you share the dataset visualization code? Or would you mind share the link of other project that can directly visualize your dataset? Dataset is really impressive and useful.

massyzs avatar Dec 03 '24 08:12 massyzs

Hi, In training script, you use InterGen (train.py, def build_models: ...) But in eval script, you use InterClip (evaluator.py, def build_models: .... ) which contains "motion_encoder". This cause the mismatched params. Can you help me explain this?

Hi, thanks for your interest in our work! The InterCLIP is the Evaluation Model which consists of a motion encoder used to evaluate the InterGen model. While the InterGen is the main Generative Model for motion generation. They are two different things, thus yielding different parameters :)

Hi, Thanks for your reply. I go through github and found there is InterCLIP, may I ask the reason you train it again by yourself? In addition, the rendered videos are all white, do you have any idea about this?

  1. We train the InterCLIP to extract the interaction features including not only the single-person motion features but also the spatial relations between two people. We train it ourselves since there is no existing evaluation model for two-people motions.
  2. May you kindly follow the readme step by step. It is probably because your InterGen checkpoint is not loaded correctly.

Hi, really thanks and it works now.

One more question:

May you share the dataset visualization code? Or would you mind share the link of other project that can directly visualize your dataset? Dataset is really impressive and useful.

you can use this visualization code :) https://github.com/davrempe/humor/tree/main/humor/viz

tr3e avatar Dec 04 '24 04:12 tr3e