[Feature request]: scrapers: youtube-url, peertube-url, odysee-url.
Background
[x] I have checked ytfzf(1) and ytfzf(5) or the wiki before creating this.
short: this is mainly for tracking history with scraped data
problem
I use an rss reader for subscribing channels, because it's nicer to have everything in one place and also saves feed items. I use ytfzf in conjunction with an rss reader because it has a history tracker/file. using the -c U flag opens the url, but doesn't track history. using the -c u flag tracks history but it only gets the url of the video and nothing else (i.e. thumbnail, title, etc).
I need it to open the url and scrape the video details and put it in the history file.
there is some hacky way that (sometimes) work; searching the url through ytfzf. but in rare cases youtube doesn't want to search the url results and show other unrelated things.
possible solution
I'm not familiar with the script, but there is already a scraper for searching, scraping video data, and then adding it to the history file (for youtube, peertube, odysee/lbry). if you could use the url instead of searching it this might work.
using the -c U flag opens the url, but doesn't track history
Im going to consider this a bug even though it would write extremely minimal information to the history, basically just the url.
I'm not familiar with the script, but there is already a scraper for searching, scraping video data
All of these are based on search results not individual urls, essentially each website would need to get it's own url scraper.
I've written myself a scraper that i never use called video-info, however it only works with youtube videos, feel free to copy/paste the code and put it in a file called video-info in ~/.config/ytfzf/scrapers. This allows you to do ytfzf -c video-info <url>
#!/usr/bin/env sh
#this scraper uses return youtube dislike's api: https://returnyoutubedislike.com/
scrape_video_info () {
set -f
unset IFS
videos="$(printf "%s\n" $1)"
output_json_file=$2
printf "Not sure what 'clearly attributed means', but i don't like cutting people short\nThis scraper uses return youtube dislike's api: https://returnyoutubedislike.com/\n" >&2
_get_invidious_thumb_quality_name
while read -r line; do
{
id="${line##*=}"
print_info "Scraping video info for $id\n"
_tmp_json="${session_temp_dir}/video-info-ratings-$id.json"
_get_request "https://returnyoutubedislikeapi.com/votes?videoid=$id" > "$_tmp_json" || exit "$?"
rating_json="$(jq '{"likes": "\(.likes)", "dislikes": "\(.dislikes)"}' < "$_tmp_json")"
_get_request "${invidious_instance}/api/v1/videos/$id" |
jq '
def pad_left(n; num):
num | tostring |
if (n > length) then ((n - length) * "0") + (.) else . end
;
{
"scraper": "video_info",
"title": .title,
"ID": .videoId,
"genre": .genre,
"author": .author,
"url": "'"$line"'",
"duration": "\(.lengthSeconds / 60 | floor):\(pad_left(2; .lengthSeconds % 60))",
"description": .description,
"thumbs": "'"${invidious_instance}"'/vi/\(.videoId)/'"$thumbnail_quality"'.jpg"
}' | jq '[ . + '"$rating_json"' ]' >> "$_tmp_json.final"
} &
count=$((count+1))
#youtube dislikes only allows 100 pings per minute
[ $count -ge 99 ] && break
done <<-EOF
$videos
EOF
wait
set +f
cat "${session_temp_dir}/video-info-ratings"*".final" >> "$output_json_file"
}
thumbnail_video_info_text_video_info () {
IFS=';' read -r genre likes dislikes <<-EOF
$(jq -r --arg url "$url" '.[]|select(.url == $url )| .genre + ";" + .likes + ";" + .dislikes' < "$video_json_file")
EOF
printf "${c_cyan}%s${c_reset} (%s)\n" "$title" "$id"
printf "${c_yellow}%s${c_reset}\n" "$genre"
printf "${c_green}%s${c_reset}/${c_red}%s${c_reset}\n" "$(printf "%s" "$likes" | add_commas)" "$(printf "%s" "$dislikes" | add_commas)"
}
@Euro20179 this works extremely well with youtube. thanks
@Euro20179 btw I can't rename video-info file to something else like youtube-url, is there a reason that is like that?
If you rename it to youtube-url it's going to look for a function called scrape_youtube_url, but the function in this file is called scrape_video_info, essentially you just have to rename the function.
@Euro20179 alright, so thumbnail_video_info_text_video_info is going to be thumbnail_video_info_text_youtube_url right?
thumbnail_video_info_text_video_infois going to bethumbnail_video_info_text_youtube_urlright?
Yes that is correct, sorry I forgot about that.
you should definitely merge this code to addons directory, or into the shell script itself. it's already pretty much complete for anyone to use
My intention was to merge it as an addon, I just forgot lol.