SVD_Xtend issues

pretrained_model_name_or_path报错

1

请问一下这个是svd_xt_image_decoder.safetensors吗，为什么我加上路径之后报错我配置的参数是pretrained_model_name_or_path="/SVD_Xtend-main/svd_xt_image_decoder.safetensors"然后就报错 raise EnvironmentError(f"It looks like the config file at '{config_file}' is not a valid JSON file.") OSError: It looks like the config file at '/SVD_Xtend-main/svd_xt_image_decoder.safetensors' is not a valid JSON...

1585511010

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 6 but got size 5 for tensor number 1 in the list.

5

when it runs: model_pred = unet( inp_noisy_latents, timesteps, encoder_hidden_states, added_time_ids=added_time_ids).sample The error is "RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 6 but got size 5...

TIANTEA

能否提供下pretrained_model_name_or_path的下载路径？

3

zhanghaobucunzai

[CUDA out of memory] training in 1024 × 576 resolution in the A100 80G

5

Hi, Thanks for any suggestions. The largest resolution that could be used for training is 512 × 512 with ~76G memory cost. I set the enable_xformers_memory_efficient_attention to True but nothing...

CallMeFrozenBanana

请问下BF16是怎么执行的，我执行有报错，帮看下

Traceback (most recent call last): File "train_svd.py", line 1262, in main() File "train_svd.py", line 1089, in main model_pred = unet( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File...

zhanghaobucunzai

video_dataset_load

add VideoDummyDataset to load video directly

SpringtoString

How to support batch_size > 1?

1

Thans for you amazing Job! At the line 1017 of train_svd.py，where the batch_size freeze as 1（noise_aug_strength = cond_sigmas[0] # TODO: support batch > 1）? What should I do to support...

liiiiiiiiil

[WIP] Support Text2Video generation.

2

WIP, Not recommended for use, as there are many unresolved issues.

pixeli99

Question about the forward pass

1

Hi, While exploring diffusion models, I noticed the standard forward pass often uses the formula $\alpha \cdot x + \sigma \cdot \epsilon$. However, in your video diffusion model code, I...

KKN18

losses barely dropped with BDDx datasets？？

![image](https://github.com/pixeli99/SVD_Xtend/assets/23293902/7563b4dd-d1c3-4a1c-9c48-b4c252cd91a8)

qiuhaining

SVD_Xtend
SVD_Xtend copied to clipboard

Metadata

pretrained_model_name_or_path报错

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 6 but got size 5 for tensor number 1 in the list.

能否提供下pretrained_model_name_or_path的下载路径？

[CUDA out of memory] training in 1024 × 576 resolution in the A100 80G

请问下BF16是怎么执行的，我执行有报错，帮看下

video_dataset_load

How to support batch_size > 1?

[WIP] Support Text2Video generation.

Question about the forward pass

losses barely dropped with BDDx datasets？？

← Metadata

Owner

Metadata

SVD_Xtend SVD_Xtend copied to clipboard

Metadata

← Metadata

Owner

Metadata

SVD_Xtend
SVD_Xtend copied to clipboard