Quan Sun
Quan Sun
Hi @pandaupc, We are very happy to hear that. It's a really big improvement. We are wondering if you have a plan to release your methods.
@1338199 https://github.com/LAION-AI/CLIP_benchmark
Hi @yihong1120, Thank you for sharing your insights and demonstrating interest in Emu2's capabilities. Emu2, being a multimodal foundational model, indeed possesses the flexibility for fine-tuning with domain-specific knowledge. Your...
In my case, when using zero3 and zero.Init in a distillation scenario, it has been observed that a memory leak can occur.
hi @tjruwase, have opened an issue [#3286](https://github.com/microsoft/DeepSpeed/issues/3286)
> new features I see here: > > * deepspeed support > * eva vit model > * load visual and text tower independently > * report text and image...
@nahidalam It's not merged. Maybe you can check https://github.com/baaivision/EVA/tree/master/EVA-CLIP which supports loading visual and text tower independently.
@nahidalam In https://github.com/mlfoundations/open_clip/pull/255, you can specify --pretrained-image and --pretrained-text to simultaneously load custom image and text encoder.
Just a follow-up. Is anyone taking a look?
Hello Gabriel, thanks for your reply! This PR is for layer decay and different lr for text/visual encoder. Learning rate layer decay is a common trick when we train a...