diart quality concerns

quality concerns

Open DmitriyG228 opened this issue 1 year ago • 1 comments

It looks like pipeline quickly forgets previous speakers, assigning wrong tags to new ones, so that a conversation of 4-5 people being inferenced as a conversation of 2.

I am testing alongside with whisperx, which seem to be using same set of default models, though gives better results.

Before diving deeper into the debugging, is there an obvious number of things I could be doing wrong? I tried non-default embedding model with same result.

Jan 06 '24 15:01 DmitriyG228

@DmitriyG228 you can check out other related issues like #4, #133 and #226 where this was already discussed

Feb 02 '24 15:02 juanmc2005

diart diart copied to clipboard

quality concerns

diart
diart copied to clipboard