Some issues about motion editing (motion in-betweening) evaluation without text condition
I noticed that the code doesn't include the motion editing without text and the default training doesn't use CFG. How can I achieve motion in-betweening without providing text information? I tried setting the word embeddings to zero or leaving the text input empty, but the results were extremely poor.
We retrain the model without text condition for this evaluation.
We retrain the model without text condition for this evaluation.
I see. So for this task, the only conditions are the fixed keyframes(tokens) at the beginning and end?
Correct. Only text conditions change (for both training and testing) everything else remains the same.