acl-anthology icon indicating copy to clipboard operation
acl-anthology copied to clipboard

Download videos from vimeo

Open mjpost opened this issue 1 year ago • 6 comments

Many videos (e.g., ACL 2017) are just links to Vimeo. These should be downloaded in bulk and linked to locally.

Tasks:

  • [x] Take stock of which proceedings link to Vimeo
    • [ ] D15.xml
    • [ ] D16.xml
    • [ ] D17.xml
    • [ ] D18.xml
    • [ ] N15.xml
    • [ ] N18.xml
    • [ ] N19.xml
    • [ ] P17.xml
    • [ ] P18.xml
    • [ ] P19.xml
    • [ ] Q14.xml
    • [ ] Q15.xml
    • [ ] Q16.xml
    • [ ] Q17.xml
    • [ ] Q18.xml
    • [ ] Q19.xml
    • [ ] W18.xml
  • [ ] Download and label them all
  • [ ] Switch links
  • [ ] Double-check that all these videos are imported (we have these on Dropbox):
image

mjpost avatar Jul 13 '23 15:07 mjpost

Will do this in the next 2 weeks.

davidstap avatar Jul 13 '23 15:07 davidstap

I'm currently downloading and labeling all Vimeo videos. I also noticed a lot of videos pointing to slideslive, should I also download these? Also, I don't think I have access to the Dropbox you're referring to, can you share it with me?

davidstap avatar Jul 20 '23 13:07 davidstap

Just emailed you a link

On Jul 20, 2023, at 09:17, David Stap @.***> wrote:

I'm currently downloading and labeling all Vimeo videos. I also noticed a lot of videos pointing to slideslive, should I also download these? Also, I don't think I have access to the Dropbox you're referring to, can you share it with me?

— Reply to this email directly, view it on GitHub https://github.com/acl-org/acl-anthology/issues/2637#issuecomment-1643910376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADPEWHX4LC3DQDNZWKOM4LXREVVDANCNFSM6AAAAAA2JEGZNE. You are receiving this because you authored the thread.

mjpost avatar Jul 20 '23 13:07 mjpost

Short update; this is a bit delayed (was busy with a lot of EMNLP reviews, and move to Germany for internship). I'll finish it next week.

davidstap avatar Aug 06 '23 14:08 davidstap

I had forgotten we'd started this. I've updated the description with a list of XML files that have video links. One idea would be to download them from the XML link so that we could set the name. The goal should be to change all the links to local storage. We could add also add an attribute to the <video> tag that retains the external link.

mjpost avatar Jan 19 '24 12:01 mjpost

Yes, I made a start but never got around to finishing it. I'll continue working on it after ingesting the ACL / EMNLP videos (almost done).

davidstap avatar Jan 22 '24 09:01 davidstap

Folks, now that this is nearing completion, there's a question of what to do about the videos on Vimeo.

The options are:

  1. Keep them up, but delete the XML references to them
  2. Keep them up, and redundantly retain the XML
  3. Delete them, along with the XML references
  4. Delete them, but keep the XML references

We can rule out (4) as silly. My inclination is (3), since we already have backups, maintaining them is work, and I'm not sure that Vimeo adds much in terms of value.

poke @mbollmann @akoehn

mjpost avatar Mar 12 '24 17:03 mjpost

Agree with 3 plus checking whether we can save money on that account (is a pro membership still needed?)

akoehn avatar Mar 12 '24 17:03 akoehn

Good thought, I bet we could ditch the pro membership. But I'll wait until they resolve our #3121...

mjpost avatar Mar 12 '24 17:03 mjpost

I found quite a few dead vimeo links (see #3125), similar to #3121. I can't find these videos anywhere else. Is there any way to restore them?

davidstap avatar Mar 13 '24 09:03 davidstap

Update: I found a total of 103 videos with broken vimeo links ("Unauthorized") in #3125

davidstap avatar Mar 13 '24 12:03 davidstap

can you post the list of them as a text file to this thread? I have an email in to Vimeo support and I'm waiting to hear back.

It's strange to me that we can't find these videos anywhere, but we don't get a message that they don't exist. I'm hoping this means that they're still there somehow.

mjpost avatar Mar 13 '24 12:03 mjpost

A list of broken links: broken_links.txt

Most of these have the "Unauthorized" issue, but others such as this one have a different error: "Sorry, we couldn't find that page".

davidstap avatar Mar 13 '24 13:03 davidstap