flame
flame copied to clipboard
Still cannot reproduce the results using the released model
Similar to #5, I still cannot reproduce the results using the released model and the results I got were extremely poor.
{"r_precision": {"top-1": 0.06051829268292683, "top-2": 0.1298780487804878, "top-3": 0.19603658536585367}, "fid": 1481.7516534444785, "clip_score": {"clip_score": 0.14643903637110403}, "mid": -53.25080871582031}
I had installed pytorch-lightning and transformers with correct versions ( 1.8.6 and 4.19.2). I tested the released model on GTX 1080Ti using the command python test.py model=diffusion_hml3d.yaml datamodule=humanml3d.yaml ckpt_path=pretrained/flame_hml3d_bc.ckpt. My python enviorment is shown as follows:
Package Version
----------------------- ------------------
absl-py 1.3.0
aiohttp 3.8.3
aiosignal 1.3.1
alembic 1.9.0
antlr4-python3-runtime 4.9.3
asttokens 2.0.5
async-timeout 4.0.2
attrs 22.1.0
autopage 0.5.1
backcall 0.2.0
black 22.12.0
body-visualizer 1.1.0
brotlipy 0.7.0
cachetools 5.3.0
certifi 2022.12.7
cffi 1.15.0
cfgv 3.3.1
chardet 5.1.0
charset-normalizer 2.1.1
click 8.1.3
cliff 4.1.0
cmaes 0.9.0
cmd2 2.4.2
colorama 0.4.6
colorlog 6.7.0
commonmark 0.9.1
configer 1.4.1
configparser 5.3.0
contourpy 1.0.6
cryptography 37.0.2
cycler 0.11.0
decorator 5.1.1
distlib 0.3.6
dotmap 1.3.30
exceptiongroup 1.0.4
executing 0.8.3
fastjsonschema 2.16.2
filelock 3.9.0
flake8 6.0.0
fonttools 4.38.0
freetype-py 2.3.0
frozenlist 1.3.3
fsspec 2022.11.0
ftfy 6.1.1
fvcore 0.1.5.post20221221
google-auth 2.15.0
google-auth-oauthlib 0.4.6
greenlet 2.0.1
grpcio 1.51.1
huggingface-hub 0.11.1
human-body-prior 2.2.2.0
hydra-colorlog 1.2.0
hydra-core 1.3.1
hydra-optuna-sweeper 1.2.0
identify 2.5.11
idna 3.4
imageio 2.23.0
imageio-ffmpeg 0.4.7
importlib-metadata 5.2.0
importlib-resources 5.10.1
iniconfig 1.1.1
iopath 0.1.10
ipython 8.7.0
isort 5.11.3
jedi 0.18.2
jsonschema 4.17.3
jupyter_core 5.1.0
kiwisolver 1.4.4
lightning-utilities 0.4.2
loguru 0.6.0
Mako 1.2.4
Markdown 3.4.1
MarkupSafe 2.1.1
matplotlib 3.6.2
matplotlib-inline 0.1.6
mccabe 0.7.0
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
moviepy 1.0.3
multidict 6.0.3
mypy-extensions 0.4.3
nbformat 5.7.1
nbstripout 0.6.1
networkx 2.8.8
nodeenv 1.7.0
numpy 1.23.4
oauthlib 3.2.2
olefile 0.46
omegaconf 2.3.0
opencv-python 4.6.0.66
optuna 2.10.1
packaging 23.0
pandas 1.5.2
parso 0.8.3
pathspec 0.10.3
pbr 5.11.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.3.0
pip 22.3.1
pkgutil_resolve_name 1.3.10
platformdirs 2.6.2
pluggy 1.0.0
portalocker 2.6.0
pre-commit 2.20.0
prettytable 3.5.0
proglog 0.1.10
prompt-toolkit 3.0.36
protobuf 3.20.1
psbody-mesh 0.4
psutil 5.9.4
ptyprocess 0.7.0
pudb 2022.1.3
pure-eval 0.2.2
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycodestyle 2.10.0
pycparser 2.21
pyflakes 3.0.1
pygame 2.1.2
pyglet 2.0.2.1
Pygments 2.13.0
PyOpenGL 3.1.0
PyOpenGL-accelerate 3.1.5
pyOpenSSL 22.0.0
pyparsing 3.0.9
pyperclip 1.8.2
pyproject_api 1.5.0
pyrender 0.1.45
pyrsistent 0.19.2
PySocks 1.7.1
pytest 7.2.0
python-dateutil 2.8.2
python-dotenv 0.21.0
pytorch-lightning 1.8.6
pytorch3d 0.7.2
pytz 2022.7
PyYAML 6.0
pyzmq 24.0.1
regex 2022.10.31
requests 2.28.1
requests-oauthlib 1.3.1
rich 12.6.0
rsa 4.9
scipy 1.9.3
setuptools 65.5.0
sh 1.14.3
six 1.16.0
SQLAlchemy 1.4.45
stack-data 0.2.0
stevedore 4.1.1
tabulate 0.9.0
tensorboard 2.11.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorboardX 2.5.1
termcolor 2.1.1
tokenizers 0.12.1
toml 0.10.2
tomli 2.0.1
torch 1.12.0
torchaudio 0.12.0
torchmetrics 0.11.0
torchvision 0.13.0
tox 4.4.5
tqdm 4.64.1
traitlets 5.8.0
transformers 4.19.2
transforms3d 0.3.1
trimesh 3.17.1
typing_extensions 4.4.0
urllib3 1.26.13
urwid 2.1.2
urwid-readline 0.13
virtualenv 20.17.1
wcwidth 0.2.5
Werkzeug 2.2.2
wheel 0.37.1
yacs 0.1.8
yarl 1.8.2
zipp 3.11.0
Can you know me the number of files in generated_samples under your testing environment? I reproduced the close numbers to the benchmark on the paper, so there might be a difference in testing files. All testing samples should be generated from test.py to reproduce the result.
The number of files in generated_samples is 6559 (hml3d). The testing samples were generated from test.py. There is no random operation in test.py and I am confused about why there is a difference in testing files.
@ToBeCodeCreater From my side, I have 6,557 test samples and generated results from running test.py. I know it takes some time in the testing stage, but can you try it again? Sorry for the inconvenience but I have made two clean containers and set up the repository from scratch and got similar numbers from the paper.
@jihoonerd I had tested the released model three times and got similar results. I'm wondering what the problem is causing this result. The size of testing data in HumanML3D/processed/test_data is 6557 and the size of training data in HumanML3D/processed/train_data is 34936.
I find that the model trained using this repository performs well on testing dataset ( got similar numbers from the paper). But the results of the released model are very poor. Very confusing.
I also got similar results when I use the pretrained weight (for HumanML3D). It will be thankful if the authors check whether the pretrained weight has some problem or not.