LWM vision chat error

Hi,

I'm trying to run run_vision_chat.sh but getting the following error:

(lwm) minyoung@claw2:~/Projects/LWM$ bash scripts/run_vision_chat.sh 
I0215 18:19:20.605390 140230836105600 xla_bridge.py:689] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
I0215 18:19:20.607900 140230836105600 xla_bridge.py:689] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
2024-02-15 18:19:29.755994: W external/xla/xla/service/gpu/nvptx_compiler.cc:744] The NVIDIA driver's CUDA version is 12.1 which is older than the ptxas CUDA version (12.3.107). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
Traceback (most recent call last):
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 254, in <module>
    run(main)
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 249, in main
    sampler = Sampler()
  File "/home/minyoung/Projects/LWM/lwm/vision_chat.py", line 42, in __init__
    self.mesh = VideoLLaMAConfig.get_jax_mesh(FLAGS.mesh_dim)
  File "/home/minyoung/Projects/LWM/lwm/llama.py", line 260, in get_jax_mesh
    return get_jax_mesh(axis_dims, ('dp', 'fsdp', 'tp', 'sp'))
  File "/home/minyoung/anaconda3/envs/lwm/lib/python3.10/site-packages/tux/distributed.py", line 140, in get_jax_mesh
    mesh_shape = np.arange(jax.device_count()).reshape(dims).shape
ValueError: cannot reshape array of size 1 into shape (1,newaxis,32,1)

These are the model configs I used.

export llama_tokenizer_path="./LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="./LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="./LWM-Chat-1M-Jax/params"
export input_file="./traj0.mp4"

Feb 16 '24 02:02 Minyoung1005

FYI what works for me:

#! /bin/bash

export SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
export PROJECT_DIR="$( cd -- "$( dirname -- "$SCRIPT_DIR" )" &> /dev/null && pwd )"
cd $PROJECT_DIR
export PYTHONPATH="$PYTHONPATH:$PROJECT_DIR"

export llama_tokenizer_path="LWM-Chat-1M-Jax/tokenizer.model"
export vqgan_checkpoint="LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="LWM-Chat-1M-Jax/params"
export input_file="taylor.jpg"

python3 -u -m lwm.vision_chat \
    --prompt="What is the image about?" \
    --input_file="$input_file" \
    --vqgan_checkpoint="$vqgan_checkpoint" \
    --dtype='fp32' \
    --load_llama_config='7b' \
    --max_n_frames=8 \
    --update_llama_config="dict(sample_mode='text',theta=50000000,max_sequence_length=131072,use_flash_attention=False,scan_attention=False,scan_query_chunk_size=128,scan_key_chunk_size=128,remat_attention='',scan_mlp=False,scan_mlp_chunk_size=2048,remat_mlp='',remat_block='',scan_layers=True)" \
    --load_checkpoint="params::$lwm_checkpoint" \
    --tokenizer.vocab_file="$llama_tokenizer_path" \
2>&1 | tee ~/output.log
read

But I didn't get video to work yet. Probably doesn't input mp4.

Also the --mesh_dim='!1,-1,32,1' \ seems off always, or has to be chosen or removed.

I wish the creators gave minimal running examples using the scripts.

Feb 16 '24 17:02 pseudotensor

Thanks for sharing, @pseudotensor ! I was also wondering if the .mp4 video file format is not supported.

Feb 16 '24 19:02 Minyoung1005

is the .avi video format supported?

Feb 20 '24 02:02 cyj95

I got the same problem. It cannot process .mp4 file.

Feb 21 '24 05:02 ghost

.mkv format works for me.

Feb 21 '24 09:02 mileyan

.mkv format works for me.

Would you mind sharing your script? I tried to use .mkv but still got the same error. Thank you for your help.

Feb 21 '24 09:02 ghost

The mesh_dim argument depends on the number of devices you're using for inference. If you want to do tensor parallelism over 8 gpus, then mesh_dim should be 1,1,8,1. The default 32 might be too high if your machine doesn't have 32 devices.

Regarding supported video files, the code here: https://github.com/LargeWorldModel/LWM/blob/0f441d39e46a607d64ea1e207eca7943306a1e3b/lwm/vision_chat.py#L84 just uses decord to read the video, so any video format that works for decord should work.

Feb 21 '24 20:02 wilson1yan