Tô Đức Anh

Results 8 comments of Tô Đức Anh

add chinese characters to non_word_boundary

Faced the same problem, kinda fixed it by adding my type of alphabet characters(in your case, chinese) to self._white_space_chars variable ``` self._keyword = '_keyword_' self._white_space_chars = set(['.', '\t', '\n', '\a',...

+1 same problem on Python

Sorry for the late reply. I wonder if apply function provide faster speed compare with pandas itself? Thanks

hi, having the same problem with network restriction. if there are any solution, i would be interested to know. Thank you in advanced

``` pipeline: name: pyannote.audio.pipelines.SpeakerDiarization params: clustering: AgglomerativeClustering embedding: pytorch_model_embedding.bin embedding_batch_size: 32 embedding_exclude_overlap: true segmentation: pytorch_model_segmentation.bin segmentation_batch_size: 32 params: clustering: method: centroid min_cluster_size: 15 threshold: 0.7153814381597874 segmentation: min_duration_off: 0.5817029604921046 threshold: 0.4442333667381752...

``` diarization_pipeline = whisperx.DiarizationPipeline() ``` initialize and ``` result = diarization_pipeline(your_audiofile_path) ``` this should work

@Dmitriuso the pyannote audio i use comes with WhisperX when i install it. i didnt install it separately. ``` pip install git+https://github.com/m-bain/whisperX.git@78dcfaab51005aa703ee21375f81ed31bc248560 ``` this should work