ytfzf [Feature request]: scrapers: youtube-url, peertube-url, odysee-url.

Background

[x] I have checked ytfzf(1) and ytfzf(5) or the wiki before creating this.

short: this is mainly for tracking history with scraped data

problem

I use an rss reader for subscribing channels, because it's nicer to have everything in one place and also saves feed items. I use ytfzf in conjunction with an rss reader because it has a history tracker/file. using the -c U flag opens the url, but doesn't track history. using the -c u flag tracks history but it only gets the url of the video and nothing else (i.e. thumbnail, title, etc).

I need it to open the url and scrape the video details and put it in the history file.

there is some hacky way that (sometimes) work; searching the url through ytfzf. but in rare cases youtube doesn't want to search the url results and show other unrelated things.

possible solution

I'm not familiar with the script, but there is already a scraper for searching, scraping video data, and then adding it to the history file (for youtube, peertube, odysee/lbry). if you could use the url instead of searching it this might work.

Jan 16 '23 01:01 jpgichw

using the -c U flag opens the url, but doesn't track history

Im going to consider this a bug even though it would write extremely minimal information to the history, basically just the url.

I'm not familiar with the script, but there is already a scraper for searching, scraping video data

All of these are based on search results not individual urls, essentially each website would need to get it's own url scraper.

I've written myself a scraper that i never use called video-info, however it only works with youtube videos, feel free to copy/paste the code and put it in a file called video-info in ~/.config/ytfzf/scrapers. This allows you to do ytfzf -c video-info <url>

#!/usr/bin/env sh

#this scraper uses return youtube dislike's api: https://returnyoutubedislike.com/

scrape_video_info () {
	set -f
	unset IFS
	videos="$(printf "%s\n" $1)"
	output_json_file=$2
	printf "Not sure what 'clearly attributed means', but i don't like cutting people short\nThis scraper uses return youtube dislike's api: https://returnyoutubedislike.com/\n" >&2
    _get_invidious_thumb_quality_name
	while read -r line; do
		{
			id="${line##*=}"
			print_info "Scraping video info for $id\n"
			_tmp_json="${session_temp_dir}/video-info-ratings-$id.json"
			_get_request "https://returnyoutubedislikeapi.com/votes?videoid=$id" > "$_tmp_json" || exit "$?"
			rating_json="$(jq '{"likes": "\(.likes)", "dislikes": "\(.dislikes)"}' < "$_tmp_json")"
			_get_request "${invidious_instance}/api/v1/videos/$id" |
				jq '
				def pad_left(n; num):
					num | tostring |
					if (n > length) then ((n - length) * "0") + (.) else . end
				;
				{
					"scraper": "video_info",
					"title": .title,
					"ID": .videoId,
					"genre": .genre,
					"author": .author,
					"url": "'"$line"'",
					"duration": "\(.lengthSeconds / 60 | floor):\(pad_left(2; .lengthSeconds % 60))",
					"description": .description,
                    "thumbs": "'"${invidious_instance}"'/vi/\(.videoId)/'"$thumbnail_quality"'.jpg"
				}' | jq '[ . + '"$rating_json"' ]' >> "$_tmp_json.final"
		} & 
		count=$((count+1))
		#youtube dislikes only allows 100 pings per minute
		[ $count -ge 99 ] && break
	done <<-EOF
	$videos
	EOF
	wait
	set +f
	cat "${session_temp_dir}/video-info-ratings"*".final" >> "$output_json_file"
}

thumbnail_video_info_text_video_info () {
	IFS=';' read -r genre likes dislikes <<-EOF
	$(jq -r --arg url "$url" '.[]|select(.url == $url )| .genre + ";" + .likes + ";" + .dislikes' < "$video_json_file")
	EOF
	printf "${c_cyan}%s${c_reset} (%s)\n" "$title" "$id"
	printf "${c_yellow}%s${c_reset}\n" "$genre"
	printf "${c_green}%s${c_reset}/${c_red}%s${c_reset}\n" "$(printf "%s" "$likes" | add_commas)" "$(printf "%s" "$dislikes" | add_commas)"
}

Jan 16 '23 03:01 Euro20179

@Euro20179 this works extremely well with youtube. thanks

Jan 16 '23 04:01 jpgichw

@Euro20179 btw I can't rename video-info file to something else like youtube-url, is there a reason that is like that?

Jan 16 '23 04:01 jpgichw

If you rename it to youtube-url it's going to look for a function called scrape_youtube_url, but the function in this file is called scrape_video_info, essentially you just have to rename the function.

Jan 16 '23 04:01 Euro20179

@Euro20179 alright, so thumbnail_video_info_text_video_info is going to be thumbnail_video_info_text_youtube_url right?

Jan 16 '23 04:01 jpgichw

thumbnail_video_info_text_video_info is going to be thumbnail_video_info_text_youtube_url right?

Yes that is correct, sorry I forgot about that.

Jan 16 '23 04:01 Euro20179

you should definitely merge this code to addons directory, or into the shell script itself. it's already pretty much complete for anyone to use

Jan 16 '23 04:01 jpgichw

My intention was to merge it as an addon, I just forgot lol.

Jan 16 '23 04:01 Euro20179