tawsif
tawsif
@endomorphosis @kevinintel https://huggingface.co/datasets/laion/Wikipedia-M3 Wikipedia M3 is done. In this dataset, I made Abstracts' embeddings for 10 most widely spoken and active research group languages. These languages are: 1. English 2....
thanks @Plachtaa for the quick answer! btw if I want to increase parameters to upto 1B, what changes to the DiT architecture should be made? do you have any advice
@Plachtaa I wanted to experiment and see if how it may behave since I had some spare compute. I was thinking to increase the number of hidden dim of DiT...
@Plachtaa thank you for being so helpful. another question, if I change this voice encoder model from "nvidia/bigvgan_v2_22khz_80band_256x" to "https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_512x" what param I should change in the config?
@GUUser91 definitely, should gather more data. I trained on 2 hours and received awesome results!
@GUUser91 I just fine-tuned
@GUUser91 400 steps
@leminhnguyen Yes, I trained on a multiple speaker dataset did almost 4 hours of data. my data was in very high-quality, it's an 11labs dataset that I had developed and...
@Plachtaa I have fine-tuned a couple of models and I was running inference. I noticed something weird. Even though my config file specificed a 512 band nvidia BigVGAN, it was...
For reference this is my notebook: https://colab.research.google.com/drive/1HeJgMIRpEMd87z5oAcfBfS8_YRLvrwr9?usp=sharing