RHVoice icon indicating copy to clipboard operation
RHVoice copied to clipboard

HTS engine needs some refactoring

Open KOLANICH opened this issue 3 years ago • 5 comments

Basically we need to add a struct of callbacks (as a C analogue of vtable), modify libmage to call them from it and make RHVoice provide that struct populated, instead of relying to linking mechanism, which would work nicely only in the case of static linking.

KOLANICH avatar Jul 14 '21 16:07 KOLANICH

instead of relying to linking mechanism, which would work nicely only in the case of static linking.

I have long wanted to ask why you insist on dynamic linking. I understand when it is used with libraries that are already available in the system, but it’s about libmage and hts, they have been modified by us and therefore will not be distributed apart from RHVoice, and also used by anyone other than us.

alex19EP avatar Jul 14 '21 16:07 alex19EP

I want eventually eradicate all the RHVoice-specific code from hts_engine and libmage source code (it would require some refactoring, and refactoring would require some coordination to upstreams, the mere fact that Olga had to patch them instead of configuring means that their design is flawed) and upstream the one that is not RHVoice-specific. Then use vanilla (under vanilla here I mean the most alive fork) ones.

KOLANICH avatar Jul 14 '21 17:07 KOLANICH

OK. now I understand, but I'm not sure if it is possible to use vanilla libraries at all ...

alex19EP avatar Jul 14 '21 17:07 alex19EP

The original HTS Engine doesn't support mixed excitation. That is, what they implement is a static maximum voiced frequency. Adding support for multiband mixed excitation required modifying the excitation generation code in the vocoder. This kind of modification is done by many users of the HTS and related code, which are mostly other speech researchers. Plugging in their own implementations and additions is just normal for that community. This library isn't really like most libraries to be used as is in Linux distributions. Another important modification in RHVoice is the optimization of the question matching code. But that really assumes that no compound questions using multiple features at once are ever used. The original HTS code is written for a general case of any combination of features in a question. I can do this because I know what I use. And if my usage ever changes, I will be modifying my code accordingly. I need full control of this part to be free to change and improve things. I don't want to be restricted by compatibility considerations here.

Olga-Yakovleva avatar Jul 14 '21 17:07 Olga-Yakovleva

@KOLANICH taking into account the above, I think it's best not to try to dynamically compile hts and mage.

alex19EP avatar Jul 14 '21 18:07 alex19EP