LWM
LWM copied to clipboard
Generate video Only First frame has img, other frames are random pixel
I use bash scripts/run_sample_video.sh, the sh file is: using LWM-Chat-1M-JAX model.
...
python3 -u -m lwm.vision_generation \
--prompt='A long big pig is walking across the street' \
--output_file='fireworks.mp4' \
--temperature_image=1.0 \
--temperature_video=1.0 \
--top_k_image=8192 \
--top_k_video=1000 \
--cfg_scale_image=5.0 \
--cfg_scale_video=1.0 \
--vqgan_checkpoint="$vqgan_checkpoint" \
--n_frames=8 \
--mesh_dim='!1,1,2,1' \
--dtype='bf16' \
--load_llama_config='7b' \
--update_llama_config="dict(sample_mode='vision',theta=50000000,max_sequence_length=32768,use_flash_attention=True,scan_attention=False,scan_query_chunk_size=256,scan_key_chunk_size=256,scan_mlp=False,scan_mlp_chunk_size=8192,scan_layers=True)" \
--load_checkpoint="params::$lwm_checkpoint" \
--tokenizer.vocab_file="$llama_tokenizer_path"
read
after generation, the output video only first frame has meaningful frame, other frame are all random pixel.