pyannote-database
pyannote-database copied to clipboard
Necessity of placeholder sanity check inside `LABLoader`
Related to #99
Hi, I don't think LABLoader is behaved as expected. If I call it directly, e.g.
loader = LABLoader(path='../only_words/labs/dev/{uri}.lab')
It works smoothly.
But when I use it with a protocol pipeline, with database.yml as following:
Protocols
AMI:
SpeakerDiarization:
only_words:
train:
uri: ../lists/train.meetings.txt
annotation: ../only_words/rttms/train/{uri}.rttm
annotated: ../uems/train/{uri}.uem
lab: ../only_words/labs/train/{uri}.lab
development:
uri: ../lists/dev.meetings.txt
annotation: ../only_words/rttms/dev/{uri}.rttm
annotated: ../uems/dev/{uri}.uem
lab: ../only_words/labs/dev/{uri}.lab
and something like
for file in only_words.development():
lab=file['lab']
It will throw the error ValueError: path must contain the {uri} placeholder.
By debugging, I notice that in LABLoader, you explicitly check if the path contains the {uri} placeholder.
https://github.com/pyannote/pyannote-database/blob/6816228629b54ab3f1ce8baff7aec4250462a547/pyannote/database/loader.py#L258-L261
While in load function in Template class, this loader is called after the resolve_path.
https://github.com/pyannote/pyannote-database/blob/6816228629b54ab3f1ce8baff7aec4250462a547/pyannote/database/custom.py#L105-L114
Hence, when the LABLoader is called during the protocol pipeline, the placeholder in the template has already been resolved to certain uri. And when you check over the resolved path, it will always raise the error of no placeholder.
So, I'm doubting the necessity of doing this sanity check in the LABLoader function, since in gather_loader function, you have already done such sanity check.
https://github.com/pyannote/pyannote-database/blob/6816228629b54ab3f1ce8baff7aec4250462a547/pyannote/database/custom.py#L225-L227
Unless you are assuming a use case that the LABLoader is called solely, outside the protocol pipeline. In that case I think it better to simply use load_lab.
I expect your opinion.