Support word and punctuation timestamps
There are versions of other open tts models that provide timestamps of words like kokoro. This can be a very useful feature for syncing visuals of the text with it's readout.
Would be happy to help if I can
@gad2103 Hi, thx for your suggestion. Because VoxCPM is an autoregressive model, it's difficult to accurately obtain the timestamp for each word during generation. You might need an external timestamp alignment tool. If you complete the post-processing code for timestamp alignment, please feel free to submit a Pull Request.
Highly recommend stable-ts for this, it's the most accurate forced alignment tool that I've seen so far