Missing data in Audioset
Hello,
I was trying to run the simulation with the given selected_list, but I found some of the IDs for Audioset is not accessible now. Below I list part of them (I haven't check all of the sample IDs):
HKTIe6piDOI
M7GmqUqVQEA
Hm20kZ7QzO0
oz3LrVaXMb4
6-kHUulyCog
TGd5kPDdN_I
IjoePLT_cFw
dKK-JaIzwS4
Cmhpj4MJ_hQ
NbBM82N1Xos
2JoJ_1agmTk
8YIELHXpf3g
AdLiRtpI01s
AgVZ65Hr9rw
4fh52mLYBYw
KKoTQfro920
L6DFGW6jeV8
X61ftZ590Uc
pK1ucosjoRo
Lpzx6N2aCMY
lnWP_zWFpBg
mg2rhu_HHR0
For example, if you go to https://www.youtube.com/watch?v=6-kHUulyCog, it says the video is unavailable. If you go to https://www.youtube.com/watch?v=Lpzx6N2aCMY, it says the video becomes private.
Could you release the unavailable samples in Audioset directly, or just change the selected list for Audioset?
Same issue here.
Actually, I found the ytid of selected AudioSet samples in selected_list are all from balanced_train and eval_segments of original AudioSet, and in my case, there are 1915 ytids in selected_list are not available. They are moved, deleted, set private, or can not be approached in the US.
Same issue here. Actually, I found the
ytidof selected AudioSet samples inselected_listare all frombalanced_trainandeval_segmentsof original AudioSet, and in my case, there are 1915ytids inselected_listare not available. They are moved, deleted, set private, or can not be approached in the US.
Thank you for your comment.
I finally checked the list, and got 1076 ytids unavailable.
I attached the list of unavailable wav files here: missing.txt
The youtube video is dynamic, and we cannot fully avoid the issue...
Anyway, I reported this issue to the main organizers already. I recommend you to contact [email protected] They will deal with this issue.
The youtube video is dynamic, and we cannot fully avoid the issue...
Anyway, I reported this issue to the main organizers already. I recommend you to contact [email protected] They will deal with this issue.
Thank you, Shinji. I will contact them.
Yes this is unfortunately a common problem with Audioset. Some videos have been pulled off some were cancelled by original uploaders...
This sample 0N0C0Wbe6AI_30.000.wav in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/49d3b2fc47/selected_lists/train/audioset.name#L22677 seems to be wrong? Because the video https://www.youtube.com/watch?v=0N0C0Wbe6AI is only 25-sec long.