Results 2 issues of AIGUI

### Description I hope to add the SenseVoice speech recognition model. The various TTS extensions on the TEN framework are too mechanical and lack emotion. SenseVoice performs better in this...

### Description The ten framework can implement the local camera to obtain the video stream, but the reality is more about IP cameras,so Hope to add the feature of multimodal...