auto-archiver icon indicating copy to clipboard operation
auto-archiver copied to clipboard

Youtube videos only contain upload date, not time

Open loganwilliams opened this issue 3 years ago • 9 comments

Is there another method of extracting this data that could be used to fill in the Upload timestamp column more completely?

loganwilliams avatar May 12 '21 07:05 loganwilliams

AFAIK youtube-dl doesn't provide the full time and date in the info it returns, just the date. Often, though, the file itself will contain that info in its metadata. Extracting the date and time from the file metadata might be an option. I'll do a bit of looking.

jamesarnall avatar Jun 13 '21 17:06 jamesarnall

it's being done with this tool from amnesty international https://citizenevidence.amnestyusa.org/

I think it is the following snippet

$('#formInputButton').click(function(){
	if(($('#formInput').val() == $('#formInput').attr('title'))||($('#formInput').val() == '')||($('#formInput').val() == ' ')) {
		alert('Please input a url');
	} else {
		var formUrl = $('#formInput').val();
		vars = getVars(formUrl);
		var urlV = vars['v'];	
		
		var theUrl = 'https://www.googleapis.com/youtube/v3/videos?id='+urlV+'&part=snippet,statistics,recordingDetails&key=AIzaSyBmQcXmAHD2h5ZurlNKHvHRwMVHbBQqbvc';		
		$.getJSON(theUrl, function(data) {		
			var shortString = processShort(data,formUrl);
			$('#shortOutput').html(shortString);
		});	
	}
});

it seems to be just an API call and JSON parsing. I will play with it and post an update

zbrasseaux avatar Nov 23 '21 03:11 zbrasseaux

Okay, so this is it. it's pretty simple. getVars strips the URL to the video id, urlV. that is then concatenated into theUrl and then it does a GET request in getJSON to get the JSON, as the function says. it's there in items->snippet->publishedAt, which stores the datetime

edit: this is only for youtube videos, i will look into the api requests for other supported formats

zbrasseaux avatar Nov 23 '21 03:11 zbrasseaux

I am seeing that youtubedl only returns the date, it may need to be something changed on their end, but it's definitely possible. alternatively, this can be used as a bandaid

zbrasseaux avatar Nov 23 '21 04:11 zbrasseaux

@zbrasseaux Thank you, this is a nice find! There are two approaches here, we could either:

  • Fork youtube-dl to obtain the date (and possibly other metadata?) using this API call. Currently youtube-dl calls a v1 API, perhaps this can just be updated to the V3 API in the Amnesty example? https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/youtube.py#L286
  • Detect a Youtube URL in auto-archiver and augment the youtube-dl metadata with our own API call

The first option is definitely preferable from my perspective, especially since there will likely be a large refactor of auto-archiver soon. If you have interest in tackling this, that would be very welcomed!

loganwilliams avatar Nov 23 '21 16:11 loganwilliams

@loganwilliams I agree that the first option is better. I have a fork of youtube_dl and am working on it locally. I'm curious how auto-archiver would use a fork of youtube_dl, or would this be contingent on the youtube_dl devs approving a PR?

I just saw that with the CLI version of youtube_dl, you can get a timestamp value. I posted a question in their issues-questions section asking where they get this value and how to access it, so hopefully that is pretty straightforward. In the meantime, i will keep trying to implement API v3

zbrasseaux avatar Nov 24 '21 08:11 zbrasseaux

@zbrasseaux That's great to hear! I think it might take some time for a youtube_dl PR to be approved, as there are over 800 pending on the project at the moment. I could just modify this project's pipenv requirements to install youtube_dl from a fork. There are some downsides to this (mainly remaining diligent about rebasing the fork off of master to make sure it does not fall far behind), but I think it's the best option.

loganwilliams avatar Nov 24 '21 08:11 loganwilliams

apparently they have a separate function for getting timestamps (in unix format, i think)

https://github.com/ytdl-org/youtube-dl/issues/30263

I will implement this tonight and make a PR

edit: nevermind, this just returns 18:00:00 once you convert it every time, i've tried it with like 4 different links now

zbrasseaux avatar Nov 24 '21 23:11 zbrasseaux

🤦‍♂️

The API v3 approach still seems promising though, thank you for your effort on this.

loganwilliams avatar Nov 25 '21 13:11 loganwilliams