pyannote-database icon indicating copy to clipboard operation
pyannote-database copied to clipboard

Necessity of placeholder sanity check inside `LABLoader`

Open alephpi opened this issue 1 year ago • 0 comments

Related to #99

Hi, I don't think LABLoader is behaved as expected. If I call it directly, e.g. loader = LABLoader(path='../only_words/labs/dev/{uri}.lab') It works smoothly.

But when I use it with a protocol pipeline, with database.yml as following:

Protocols
  AMI:
    SpeakerDiarization:
      only_words:
        train:
            uri: ../lists/train.meetings.txt
            annotation: ../only_words/rttms/train/{uri}.rttm
            annotated: ../uems/train/{uri}.uem
            lab: ../only_words/labs/train/{uri}.lab
        development:
            uri: ../lists/dev.meetings.txt
            annotation: ../only_words/rttms/dev/{uri}.rttm
            annotated: ../uems/dev/{uri}.uem
            lab: ../only_words/labs/dev/{uri}.lab

and something like

for file in only_words.development():
    lab=file['lab']

It will throw the error ValueError: path must contain the {uri} placeholder.

By debugging, I notice that in LABLoader, you explicitly check if the path contains the {uri} placeholder. https://github.com/pyannote/pyannote-database/blob/6816228629b54ab3f1ce8baff7aec4250462a547/pyannote/database/loader.py#L258-L261

While in load function in Template class, this loader is called after the resolve_path. https://github.com/pyannote/pyannote-database/blob/6816228629b54ab3f1ce8baff7aec4250462a547/pyannote/database/custom.py#L105-L114

Hence, when the LABLoader is called during the protocol pipeline, the placeholder in the template has already been resolved to certain uri. And when you check over the resolved path, it will always raise the error of no placeholder.

So, I'm doubting the necessity of doing this sanity check in the LABLoader function, since in gather_loader function, you have already done such sanity check. https://github.com/pyannote/pyannote-database/blob/6816228629b54ab3f1ce8baff7aec4250462a547/pyannote/database/custom.py#L225-L227

Unless you are assuming a use case that the LABLoader is called solely, outside the protocol pipeline. In that case I think it better to simply use load_lab.

I expect your opinion.

alephpi avatar Sep 25 '24 09:09 alephpi