YouTube-operational-API
YouTube-operational-API copied to clipboard
Optimize performance
See YouTube Data API v3 optimizing performance documentation.
Are adding compression making sense, as it is included in apache and curl by default, isn't it?
May think about using compressed parameter to decrease server workload, but I don't think that it is worth it.
Related to #27 and #35.
Firsly knowing how to do without removing -H 'Accept-Encoding: gzip, deflate, br' from a cURL request and why gunzip doesn't work sometimes (when?). If only provide gzip as Accept-Encoding, it always correctly return data compatible with gunzip. It's a bit weird (maybe due to their relative overload) from YouTube API to not have a prefered compression method.
Accept-Encoding documentation.
curl -v 'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet,contentDetails,status&playlistId=UUAcAnMF0OrCtUep3Y4M-ZPw&maxResults=50&key=AIzaSy...'
Having 166,527 bytes of content according to ls -l.
If add: -H 'Accept-Encoding: gzip, deflate, br' > a && gunzip -c a:
Total packets Length according to Wireshark for the Google API instance IP: 46640
Otherwise if add: > a && cat a:
Total packets Length according to Wireshark for the Google API instance IP: 187614
Total packets Length according to Wireshark for the Google API instance IP: 187713
Executed twice to verify the order of magnitude.
The question is do the API file_get_contents use compression? What are CPU overload of my instances to verify that it wouldn't be an unacceptable CPU overload.
I added to each crontab of official instances:
* * * * * (date && cat /proc/loadavg && cat /proc/meminfo | head -n 3) > health.txt
Could also investigate HTTP Range header.
Could also propose a maxResults and fields parameter, as requested on Discord maxResults and fields. Here is another Discord user expecting maxResults to work.
Note that concerning channels?part=community it returns sometimes empty pages when using nextPageToken, as the YouTube UI, however according to amatis on Matrix it may happen with no more data after so we could try to find an optimization fix to avoid making a few empty requests at the end.
Increase priority following this Discord message. Depending on the endpoint you are using there are maybe alternative webpages less bandwidth consuming to retrieve.
By the way
membership:trueI adapted it to my site and it was much faster than yours. If you want it to be faster, instead of this URL:if ($options['membership']) { $result = getJSONFromHTML("[https://www.youtube.com/channel/$id"](https://www.youtube.com/channel/$id%22));use this URL:
if ($options['membership']) { $result = getJSONFromHTML("[https://www.youtube.com/channel/$id/search"](https://www.youtube.com/channel/$id/search%22));Because your URL: 880kb. My URL: 400kb. It can pull and query faster. 40% speed difference.
Source: private Discord message from (788496476187263026)
To avoid making a first request to have a continuation token, being able to reverse-engineer this continuation token would improve performances, cf #190. This would possibly make a single case instead of a first HTML web-scraping and then JSON continuation.
Currently tests are parsed during production delivery...
Like in #258 can use browse YouTube UI endpoint to only retrieve JSON and not HTML containing JSON.
Someone told me on Discord to only receive JSON thanks to YouTube UI browse endpoint, this seems related to #252.
Regex do not seem compilable contrarily to Python re module, see https://www.php.net/manual-lookup.php?pattern=preg_compile.
https://discord.com/channels/@me/1075498558259724362/1294495326635560982 old private instance owner requested this feature.
Related to Webscrap_any_website/issues/40.