ConferencingSpeech2021
ConferencingSpeech2021 copied to clipboard
Missing data in Audioset
Hello,
I was trying to run the simulation with the given selected_list, but I found some of the IDs for Audioset is not accessible now. Below I list part of them (I haven't check all of the sample IDs):
HKTIe6piDOI
M7GmqUqVQEA
Hm20kZ7QzO0
oz3LrVaXMb4
6-kHUulyCog
TGd5kPDdN_I
IjoePLT_cFw
dKK-JaIzwS4
Cmhpj4MJ_hQ
NbBM82N1Xos
2JoJ_1agmTk
8YIELHXpf3g
AdLiRtpI01s
AgVZ65Hr9rw
4fh52mLYBYw
KKoTQfro920
L6DFGW6jeV8
X61ftZ590Uc
pK1ucosjoRo
Lpzx6N2aCMY
lnWP_zWFpBg
mg2rhu_HHR0
For example, if you go to https://www.youtube.com/watch?v=6-kHUulyCog, it says the video is unavailable. If you go to https://www.youtube.com/watch?v=Lpzx6N2aCMY, it says the video becomes private.
Could you release the unavailable samples in Audioset directly, or just change the selected list for Audioset?
Same issue here.
Actually, I found the ytid
of selected AudioSet samples in selected_list
are all from balanced_train
and eval_segments
of original AudioSet, and in my case, there are 1915 ytid
s in selected_list
are not available. They are moved, deleted, set private, or can not be approached in the US.
Same issue here. Actually, I found the
ytid
of selected AudioSet samples inselected_list
are all frombalanced_train
andeval_segments
of original AudioSet, and in my case, there are 1915ytid
s inselected_list
are not available. They are moved, deleted, set private, or can not be approached in the US.
Thank you for your comment.
I finally checked the list, and got 1076 ytid
s unavailable.
I attached the list of unavailable wav files here: missing.txt
The youtube video is dynamic, and we cannot fully avoid the issue...
Anyway, I reported this issue to the main organizers already. I recommend you to contact [email protected] They will deal with this issue.
The youtube video is dynamic, and we cannot fully avoid the issue...
Anyway, I reported this issue to the main organizers already. I recommend you to contact [email protected] They will deal with this issue.
Thank you, Shinji. I will contact them.
Yes this is unfortunately a common problem with Audioset. Some videos have been pulled off some were cancelled by original uploaders...
This sample 0N0C0Wbe6AI_30.000.wav
in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/49d3b2fc47/selected_lists/train/audioset.name#L22677 seems to be wrong? Because the video https://www.youtube.com/watch?v=0N0C0Wbe6AI
is only 25-sec long.