STC

Results 2 issues of STC

> Which part of the code do you need most urgently: DeCo training, inference, ckpts or the interpretability tool R_GAE? _Originally posted by @yaolinli in [#7](https://github.com/yaolinli/DeCo/issues/7#issuecomment-2608834930)_ I need the code...

### feature when using --deepspeed zero3.json and --pretrain_mm_mlp_adapter at the same time, the code now doesn't support. For the weights has already been shard, the load_state_dict in the function initialize_vision_modules...