CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

How to start with 4090?

Open shuiyued opened this issue 1 year ago • 7 comments
trafficstars

Great open-source work! I'd like to suggest adding some instructions to the README. Specifically, there are missing instructions for users with 24GB of VRAM and those with more than 24GB of VRAM on how to quickly set up the environment, download models, and deploy inference. The current format might leave interested users unsure of how to start.

shuiyued avatar Aug 06 '24 15:08 shuiyued

Thanks for your reminder, we have added the quick start instruction at the beginning of the readme.

tengjiayan20 avatar Aug 06 '24 18:08 tengjiayan20

Hello, can you specify which version of Python you are using? I am encountering a conflict between Python 3.9's typing module and beartype version 0.18.5.

Traceback (most recent call last):
  File "/home/CogVideo/sat/sample_video.py", line 18, in <module>
    from diffusion_video import SATVideoDiffusionEngine
  File "/home/CogVideo/sat/diffusion_video.py", line 12, in <module>
    from sgm.modules import UNCONDITIONAL_CONFIG
  File "/home/CogVideo/sat/sgm/__init__.py", line 1, in <module>
    from .models import AutoencodingEngine
  File "/home/CogVideo/sat/sgm/models/__init__.py", line 1, in <module>
    from .autoencoder import AutoencodingEngine
  File "/home/CogVideo/sat/sgm/models/autoencoder.py", line 29, in <module>
    from ..modules.cp_enc_dec import _conv_split, _conv_gather
  File "/home/CogVideo/sat/sgm/modules/cp_enc_dec.py", line 8, in <module>
    from beartype import beartype
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/site-packages/beartype/__init__.py", line 58, in <module>
    from beartype._decor.decormain import (
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/site-packages/beartype/_decor/decormain.py", line 26, in <module>
    from beartype._conf.confcls import (
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/site-packages/beartype/_conf/confcls.py", line 46, in <module>
    from beartype._conf.confoverrides import (
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/site-packages/beartype/_conf/confoverrides.py", line 15, in <module>
    from beartype._data.hint.datahinttyping import (
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/site-packages/beartype/_data/hint/datahinttyping.py", line 290, in <module>
    BeartypeReturn = Union[BeartypeableT, BeartypeConfedDecorator]
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/typing.py", line 243, in inner
    return func(*args, **kwds)
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/typing.py", line 316, in __getitem__
    return self._getitem(self, parameters)
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/typing.py", line 421, in Union
    parameters = _remove_dups_flatten(parameters)
  File "/home/envs/miniconda3/envs/cogvideo/lib/python3.9/typing.py", line 215, in _remove_dups_flatten
    all_params = set(params)
TypeError: unhashable type: 'list'

shuiyued avatar Aug 07 '24 14:08 shuiyued

We use python 3.11. Sorry for not specifying the specific python version. We will clarify in Readme.

tengjiayan20 avatar Aug 07 '24 14:08 tengjiayan20

Thanks, I also find the issue in beartype and then I will use the correct Python version. https://github.com/beartype/beartype/issues/406#issuecomment-2211954903

shuiyued avatar Aug 07 '24 14:08 shuiyued

now it is support diffuser framework using 4090

zRzRzRzRzRzRzR avatar Aug 07 '24 15:08 zRzRzRzRzRzRzR

Hello, will the results with the diffuser be better than those with the SAT? Besides, I find that the generated video of rigid objects will have a nice view, but it will be bad when there are people and multiple objects. If I want to generate videos with consistent geometry, what kind of prompts will be better in your experience? I think it may be related to the training data. Thanks!

shuiyued avatar Aug 08 '24 04:08 shuiyued

Hello, will the results with the diffuser be better than those with the SAT? Besides, I find that the generated video of rigid objects will have a nice view, but it will be bad when there are people and multiple objects. If I want to generate videos with consistent geometry, what kind of prompts will be better in your experience? I think it may be related to the training data. Thanks!

  1. In theory, the two are the same. But in practice I think sat may be a little better, because the diffusers version is converted from the sat version, and there will inevitably be a reasonable difference between them.
  2. You can use the prompt optimization method mentioned in quick start, and replacing glm-4 with gpt-4o may produce better results.

tengjiayan20 avatar Aug 08 '24 05:08 tengjiayan20