kabachuha comments

Results 40 comments of


kabachuha

add vae

Using a ready made VAE from the diffusers library might be a better choice than reinventing the wheel (and compatible with a lot more VAEs) See example here https://github.com/Vchitect/Latte/blob/9ededbe590a5439b6e7013d00fbe30e6c9b674b8/sample/sample.py#L69

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

Additionally, here is my project for high-quality long-form (recursive) video captioning using VideoLLaVA https://github.com/kabachuha/video2scenario which may be useful for you

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

Well, we contribute to opensource too, from time to time (PixArt-alpha/delta and GradTTS family are from Noah's Ark too), especially when it's tangential to the main work and can't be...

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

Visual examples of masked (not only temporal, but spacial!) training/inference process of Meta's V-Jepa https://github.com/facebookresearch/jepa

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

Made a PR for alternative attentions in https://github.com/PKU-YuanGroup/Open-Sora-Plan/pull/63

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

Inpainting training code for SDXL https://github.com/huggingface/diffusers/pull/6592/files (should be extended to 2+1D) RectifiedFlow diffusers implementation: https://github.com/huggingface/diffusers/blob/534f5d54faf24a72a0e2bff2f6e6379ea519c4ed/examples/community/instaflow_one_step.py#L55

kabachuha

add vae

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

My Architecture Overhaul Practical Roadmap for Faster and Less Resources T2V Generation

[docs]: use llava1.6 as a captioner

Can we use t5-large text encoder model to use with the opensora pretrained weights?

Add comma to the list of invalid filename chars

[Bug]: 3d osx ventura m1 crash after 1st generation