FunASR
FunASR copied to clipboard
'hdbscan' module not found; maybe use installed sklearn.cluster.HDBSCAN?
❓ Questions and Help
What is your question?
Starting from a fresh container environment equipped with pytorch and funasr (via pip install funasr), I encountered ModuleNotFoundError: No module named 'hdbscan' when I instanciate an AutoModel with a spk model. It originates from the import hdbscan in UmapHdbscan() <- ClusterBackend() <- AutoModel(...).
- Must I install
hdbscanmanually? Is there any other package that I also need in advance?
- I am crafting my own container and I am frustrated to find that I have to build my image again. I see no hint message from the output or doc.
- There is a
sklearn.cluster.HDBSCAN, and I findsklearnis already there withfunasrinstalled. Can we just use that sklearn one instead of installing the standalone versionhdbscan?
- These two versions seem coming from same authors, and differ in some minor ways (see https://github.com/scikit-learn/scikit-learn/issues/27829)
Code
from funasr import AutoModel
model = AutoModel(
model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4",
vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_model_revision="v2.0.4",
punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", punc_model_revision="v2.0.4",
spk_model="iic/speech_campplus_sv_zh-cn_16k-common", spk_model_revision="v2.0.2",
)
What have you tried?
In a pytorch docker container, run pip install funasr and then the script above.
What's your environment?
- OS (e.g., Linux):
- FunASR Version (e.g., 1.0.0): 1.0.19
- ModelScope Version (e.g., 1.11.0): None (do not need it)
- PyTorch Version (e.g., 2.0.0): 2.2.2
- How you installed funasr (
pip, source): pip - Python version: 3.10.14
- GPU (e.g., V100M32): NVIDIA GeForce RTX 4090
- CUDA/cuDNN version (e.g., cuda11.7): cuda11.8
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1): pytorch/pytorch:2.2.2-cuda11.8-cudnn8-runtime
- Any other relevant information:
delete all *model_revision, and try it again. All requirements would be installed automatically.
Yes, thank you. But basically what I want to do is to build an image with installed packages ahead of running any scripts. I believe I should not figure it out through trial and error by myself.
Yes, thank you. But basically what I want to do is to build an image with installed packages ahead of running any scripts. I believe I should not figure it out through trial and error by myself.
If there exists any errors, please let me know after you delete all *model_revision.
If there exists any errors, please let me know after you delete all *model_revision.
Sadly yes.
I removed all *model_revision:
from funasr import AutoModel
model = AutoModel(
model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
spk_model="iic/speech_campplus_sv_zh-cn_16k-common",
)
And I still got:
ckpt: iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
ckpt: iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
ckpt: iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/model.pt
ckpt: iic/speech_campplus_sv_zh-cn_16k-common/campplus_cn_common.bin
Traceback (most recent call last):
File "/shared/test-funasr/tmp_test.py", line 10, in <module>
spk_model="iic/speech_campplus_sv_zh-cn_16k-common",
File "/home/user/.local/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 135, in __init__
self.cb_model = ClusterBackend().to(kwargs["device"])
File "/home/user/.local/lib/python3.10/site-packages/funasr/models/campplus/cluster_backend.py", line 149, in __init__
self.umap_hdbscan_cluster = UmapHdbscan()
File "/home/user/.local/lib/python3.10/site-packages/funasr/models/campplus/cluster_backend.py", line 118, in __init__
import hdbscan
ModuleNotFoundError: No module named 'hdbscan'
FunASR Version: 1.0.19
And I cannot even import funasr using the latest commit (702b9b540c3c1524748cd975a10ce33f0fa53912) on main branch:
>>> import funasr
/.../FunASR/funasr/datasets/large_datasets/utils/tokenize.py:93: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if vad is not -2:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../FunASR/funasr/__init__.py", line 36, in <module>
import_submodules(__name__)
File "/.../FunASR/funasr/__init__.py", line 33, in import_submodules
results.update(import_submodules(name))
File "/.../FunASR/funasr/__init__.py", line 33, in import_submodules
results.update(import_submodules(name))
File "/.../FunASR/funasr/__init__.py", line 33, in import_submodules
results.update(import_submodules(name))
File "/.../FunASR/funasr/__init__.py", line 25, in import_submodules
for loader, name, is_pkg in pkgutil.walk_packages(package.__path__, package.__name__ + '.'):
AttributeError: 'str' object has no attribute '__path__'. Did you mean: '__hash__'?
Plus: all my models are already there inside the literally iic folder in current directory, so there is no extra downloads. The environment running above script does not have modelscope installed.
Still worth mentioning: during the image building phase one should not use a test script like this to 'trigger' the auto installation of extra dependencies, which is anti-pattern. It needs explicit commands to prepare the environment, like pip install funasr[spk].
If there exists any errors, please let me know after you delete all *model_revision.
Sadly yes.
I removed all *model_revision:
from funasr import AutoModel model = AutoModel( model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch", vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", spk_model="iic/speech_campplus_sv_zh-cn_16k-common", )And I still got:
ckpt: iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt ckpt: iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt ckpt: iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/model.pt ckpt: iic/speech_campplus_sv_zh-cn_16k-common/campplus_cn_common.bin Traceback (most recent call last): File "/shared/test-funasr/tmp_test.py", line 10, in <module> spk_model="iic/speech_campplus_sv_zh-cn_16k-common", File "/home/user/.local/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 135, in __init__ self.cb_model = ClusterBackend().to(kwargs["device"]) File "/home/user/.local/lib/python3.10/site-packages/funasr/models/campplus/cluster_backend.py", line 149, in __init__ self.umap_hdbscan_cluster = UmapHdbscan() File "/home/user/.local/lib/python3.10/site-packages/funasr/models/campplus/cluster_backend.py", line 118, in __init__ import hdbscan ModuleNotFoundError: No module named 'hdbscan'FunASR Version: 1.0.19
And I cannot even import funasr using the latest commit (702b9b5) on main branch:
>>> import funasr /.../FunASR/funasr/datasets/large_datasets/utils/tokenize.py:93: SyntaxWarning: "is not" with a literal. Did you mean "!="? if vad is not -2: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../FunASR/funasr/__init__.py", line 36, in <module> import_submodules(__name__) File "/.../FunASR/funasr/__init__.py", line 33, in import_submodules results.update(import_submodules(name)) File "/.../FunASR/funasr/__init__.py", line 33, in import_submodules results.update(import_submodules(name)) File "/.../FunASR/funasr/__init__.py", line 33, in import_submodules results.update(import_submodules(name)) File "/.../FunASR/funasr/__init__.py", line 25, in import_submodules for loader, name, is_pkg in pkgutil.walk_packages(package.__path__, package.__name__ + '.'): AttributeError: 'str' object has no attribute '__path__'. Did you mean: '__hash__'?
FunASR Version: 1.0.19
You should pip install -e .
I mean I tried both ways:
pip install funasrto install the latest pypi version (1.0.19)pip install -e .after pulling the latest commit of main branch, which results in above error.
I mean I tried both ways:
pip install funasrto install the latest pypi version (1.0.19)pip install -e .after pulling the latest commit of main branch, which results in above error.
先 pip install -e . 然后把这里注释解除,把报错log出来:https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/init.py#L21
I mean I tried both ways:
pip install funasrto install the latest pypi version (1.0.19)pip install -e .after pulling the latest commit of main branch, which results in above error.
Bug has been fixed. Please update funasr https://github.com/alibaba-damo-academy/FunASR/pull/1580 :
pip pull
pip install -e .
I pulled latest commit, used pip install -e . and uncommnet the print (see screenshot), but found still the same output:
So there is no error reported here.
Requirements would be installed in https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/download/download_from_hub.py#L76
Maybe you could debug it and show the log.
Plus: all my models are already there inside the literally
iicfolder in current directory, so there is no extra downloads. The environment running above script does not havemodelscopeinstalled.
The problem is that models of previous revision (instead of master) is already downloaded in the iic folder, and the code does not check that and will not redownload the latest master revision. So there is no requirements.txt file in the campplus model folder.
I now understand that the requirements.txt comes from the model dir. Maybe some mechanism of auto redownloading the specified revision is required?
❓ And also I wonder if this is possible:
2. There is a
sklearn.cluster.HDBSCAN, and I findsklearnis already there withfunasrinstalled. Can we just use that sklearn one instead of installing the standalone versionhdbscan?