FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

sentence_info 内容不全

Open tiaanaqiqikuaipao opened this issue 1 year ago • 1 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

我用一个语音离线生成spk,但是发现key与sentence_info内容不能匹配,sentence_info只有key的一半内容。

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

from funasr import AutoModel

paraformer-zh is a multi-functional asr model

use vad, punc, spk or not as you need

model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++" ) res = model.generate(input="2speakers_example.wav", batch_size_s=1, hotword='魔搭') print(res)

'key': 'rand_key_2yW4Acq9GFz6Y', 'text': '嗯,那么今天我们就简单的进行一下那个新生招聘的嗯讨论吧。因为现在不是马上就新生到校嘛,然后我们社团呢也需要招聘一些新的社员,然后就今天就大概就讨论一下嗯怎么招聘的内容吧。嗯,我们就首先想一下那个招聘的地点在哪里吧。嗯地点的话我们现在可以有三个选择。嗯,第一个的话我们可以选择在操场,因为那儿嗯学生流动量也挺大的。操场的话这这段时间太热了,我怕那个人流量有点少。嗯,那我们还可以有第二个选择呀。嗯,我们可以在图书馆楼下那里有一块可以遮阴的地方哦,图书馆我觉得应该还可以吧。嗯,就怕那些嗯新生我应该也会去吧。因为他如果刚刚到校,他应该就第一选择。如果是我的话,我也比较想去那个图书馆,还有什么地方呢?嗯,第三个的话,我们可以在演播厅底下,因为现在那里就已经有了很多社团在招新,然后我们过去的话也算。'

'sentence_info': [{'text': '嗯,', 'start': 5570, 'end': 5810, 'timestamp': [[5570, 5810]], 'spk': 0}, {'text': '那么今天我们就简单的进行一下那。', 'start': 5810, 'end': 8630, 'timestamp': [[5810, 5950], [5950, 6150], [6150, 6230], [6230, 6470], [6490, 6650], [6650, 6850], [6850, 7090], [7230, 7370], [7370, 7550], [7550, 7750], [7750, 7850], [7850, 8090], [8110, 8210], [8210, 8430], [8430, 8630]], 'spk': 0}, {'text': '个新生招聘的嗯讨,', 'start': 8630, 'end': 11630, 'timestamp': [[8630, 8870], [8910, 9150], [9170, 9410], [9510, 9750], [9770, 10010], [10010, 10250], [10690, 10930], [11390, 11630]], 'spk': 0}, {'text': '论吧因为现在不是马上,', 'start': 11630, 'end': 13890, 'timestamp': [[11630, 11870], [11870, 12070], [12070, 12210], [12210, 12370], [12370, 12510], [12510, 12690], [12690, 12810], [12810, 13050], [13550, 13770], [13770, 13890]], 'spk': 0}, {'text': '就新生到校嘛然后我们社团呢。', 'start': 13890, 'end': 16590, 'timestamp': [[13890, 14090], [14090, 14250], [14250, 14490], [14510, 14730], [14730, 14970], [15030, 15270], [15530, 15690], [15690, 15810], [15810, 15930], [15930, 16070], [16070, 16230], [16230, 16410], [16410, 16590]], 'spk': 0}, {'text': '也,', 'start': 16590, 'end': 16750, 'timestamp': [[16590, 16750]], 'spk': 0}, {'text': '需要招聘一些新的社员然。', 'start': 16750, 'end': 19330, 'timestamp': [[16750, 16890], [16890, 17090], [17090, 17230], [17230, 17450], [17450, 17590], [17590, 17730], [17730, 17950], [17950, 18150], [18150, 18370], [18370, 18610], [19130, 19330]], 'spk': 0}, {'text': '后就今天就大概就讨。', 'start': 19330, 'end': 21130, 'timestamp': [[19330, 19510], [19510, 19750], [19770, 19930], [19930, 20110], [20110, 20350], [20370, 20590], [20590, 20710], [20710, 20930], [20930, 21130]], 'spk': 0}, {'text': '论,', 'start': 21130, 'end': 21369, 'timestamp': [[21130, 21369]], 'spk': 0}, {'text': '一下嗯怎么招聘,', 'start': 21369, 'end': 23150, 'timestamp': [[21389, 21490], [21490, 21730], [22090, 22330], [22450, 22570], [22570, 22710], [22710, 22910], [22910, 23150]], 'spk': 0}, {'text': '的内容吧嗯我们就首。', 'start': 23150, 'end': 25430, 'timestamp': [[23150, 23390], [23430, 23570], [23570, 23810], [23810, 24050], [24430, 24670], [24730, 24830], [24830, 24950], [24950, 25190], [25230, 25430]], 'spk': 0}, {'text': '先想一下那个,', 'start': 25430, 'end': 26570, 'timestamp': [[25430, 25670], [25750, 25930], [25930, 26030], [26030, 26170], [26170, 26330], [26330, 26570]], 'spk': 0}, {'text': '招聘的地点。', 'start': 26570, 'end': 27770, 'timestamp': [[26790, 27030], [27050, 27230], [27230, 27370], [27370, 27530], [27530, 27770]], 'spk': 0}, {'text': '在,', 'start': 27770, 'end': 27950, 'timestamp': [[27770, 27950]], 'spk': 0}, {'text': '哪里吧嗯地点的话。', 'start': 27950, 'end': 30400, 'timestamp': [[27950, 28130], [28130, 28210], [28210, 28695], [29540, 29760], [29760, 29920], [29920, 30120], [30120, 30220], [30220, 30400]], 'spk': 1}, {'text': '我,', 'start': 30400, 'end': 30480, 'timestamp': [[30400, 30480]], 'spk': 1}, {'text': '们现在可以有三个选择嗯第,', 'start': 30480, 'end': 33180, 'timestamp': [[30480, 30600], [30600, 30820], [30820, 31060], [31160, 31280], [31280, 31380], [31380, 31540], [31540, 31700], [31700, 31900], [31900, 32120], [32120, 32360], [32780, 33020], [33080, 33180]], 'spk': 1}, {'text': '一个的话我们可。', 'start': 33180, 'end': 34020, 'timestamp': [[33180, 33300], [33300, 33440], [33440, 33540], [33540, 33720], [33720, 33800], [33800, 33900], [33900, 34020]], 'spk': 1}, {'text': '以,', 'start': 34020, 'end': 34120, 'timestamp': [[34020, 34120]], 'spk': 1}, {'text': '选择在操场因为那。', 'start': 34120, 'end': 36760, 'timestamp': [[34120, 34300], [34300, 34540], [34620, 34860], [35480, 35720], [35740, 35980], [36140, 36280], [36280, 36520], [36520, 36760]], 'spk': 1}, {'text': '儿嗯学生,', 'start': 36760, 'end': 38610, 'timestamp': [[36760, 37115], [37770, 38010], [38190, 38410], [38410, 38610]], 'spk': 1}, {'text': '流动量也挺。', 'start': 38610, 'end': 39410, 'timestamp': [[38610, 38770], [38770, 38870], [38870, 39070], [39070, 39190], [39190, 39410]], 'spk': 1}, {'text': '大的操,', 'start': 39410, 'end': 40330, 'timestamp': [[39410, 39650], [39650, 39890], [40090, 40330]], 'spk': 1}, {'text': '场的话这这段,', 'start': 40330, 'end': 42270, 'timestamp': [[40370, 40610], [40630, 40730], [40730, 40970], [41510, 41750], [41890, 42050], [42050, 42270]], 'spk': 0}, {'text': '时间太?', 'start': 42270, 'end': 42890, 'timestamp': [[42270, 42470], [42470, 42670], [42670, 42890]], 'spk': 0}, {'text': '热,', 'start': 42890, 'end': 43130, 'timestamp': [[42890, 43130]], 'spk': 0}, {'text': '了我,', 'start': 43130, 'end': 43550, 'timestamp': [[43190, 43410], [43410, 43550]], 'spk': 0}, {'text': '怕那个人,', 'start': 43550, 'end': 44750, 'timestamp': [[43550, 43790], [44290, 44450], [44450, 44590], [44590, 44750]], 'spk': 0}, {'text': '流量有点少嗯那我们还可,', 'start': 44750, 'end': 47290, 'timestamp': [[44750, 44930], [44930, 45170], [45210, 45350], [45350, 45530], [45530, 45770], [46270, 46510], [46550, 46750], [46750, 46830], [46830, 46970], [46970, 47190], [47190, 47290]], 'spk': 0}, {'text': '以有第二个选。', 'start': 47290, 'end': 47970, 'timestamp': [[47290, 47370], [47370, 47450], [47450, 47550], [47550, 47650], [47650, 47770], [47770, 47970]], 'spk': 1}]

Code sample

Expected behavior

Environment

2024-05-09 23:53:51,045 - modelscope - WARNING - Model revision not specified, use revision: v2.0.9 ckpt: /mnt/workspace/.cache/modelscope/damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt 2024-05-09 23:53:53,480 - modelscope - WARNING - Model revision not specified, use revision: v2.0.4 ckpt: /mnt/workspace/.cache/modelscope/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt 2024-05-09 23:53:53,989 - modelscope - WARNING - Model revision not specified, use revision: v2.0.4 ckpt: /mnt/workspace/.cache/modelscope/damo/punc_ct-transformer_cn-en-common-vocab471067-large/model.pt 2024-05-09 23:53:57,279 - modelscope - WARNING - Model revision not specified, use revision: v2.0.2 ckpt: /mnt/workspace/.cache/modelscope/damo/speech_campplus_sv_zh-cn_16k-common/campplus_cn_common.bin

Additional context

tiaanaqiqikuaipao avatar May 09 '24 15:05 tiaanaqiqikuaipao

Environment: OS: Linux FunASR Version: 1.0.14 PyTorch version 2.1.2+cu121 How you installed funasr: from funasr import AutoModel,在modelscope的notebook中执行这个代码自动安装的 Python version: 3.10.13 GPU: NVIDIA A10 CUDA/cuDNN version: cuda_12.1.r12.1

录音

Uploading 2speakers_example.zip…

tiaanaqiqikuaipao avatar May 10 '24 04:05 tiaanaqiqikuaipao