pyannote-database icon indicating copy to clipboard operation
pyannote-database copied to clipboard

Faster RTTMLoader

Open hbredin opened this issue 2 years ago • 2 comments

RTTMLoader class is extremely slow for large RTTM files containing annotation of multiple audio files (e.g. VoxCeleb dataset).

We should make it faster!

hbredin avatar Apr 07 '23 12:04 hbredin

cc @clement-pages

I am not assigning this issue to you but just wanted to let you know that I took note of what we discussed today.

hbredin avatar Apr 07 '23 13:04 hbredin

I have just pushed two PRs that should make things much faster:

  • this pyannote.database PR relies on vanilla csv library instead of pandas
  • this pyannote.core PR switches from sortedcontainers.SortedDict to vanilla dict in Annotation internals (making Annotation.__init__ orders of magnitude faster).

I still need to make sure those PRs do not break anything but you could already try them on your use case (this requires that you install both pyannote.database and pyannote.core from the corresponding branches).

hbredin avatar Apr 11 '23 12:04 hbredin