TATS icon indicating copy to clipboard operation
TATS copied to clipboard

Audio Feature Extraction about Audio-to-Video Generation

Open suimuc opened this issue 10 months ago • 0 comments

Hello, I’m currently exploring the functionality of the audio-to-video script in this repository and would like to understand how the audio features are extracted as part of the process, specifically regarding the STFT features in the stft_pickle data which has a shape of (90, 45, 17) while the corresponding video has 90 frames; could you explain how the STFT (Short-Time Fourier Transform) features are computed?

suimuc avatar Apr 28 '25 06:04 suimuc