YouTube-operational-API
YouTube-operational-API copied to clipboard
YT operational API Search endpoint not able to fetch more than 500 results
YouTube Data API v3 Search: list endpoint is limited to 500 results:
Note: Search results are constrained to a maximum of 500 videos if your request specifies a value for the channelId parameter and sets the type parameter value to video, but it does not also set one of the forContentOwner, forDeveloper, or forMine filters.
Source: Search: list#channelId
Note that this 500 limit seems to happen not only for the documentation described case.
It seems possible to fetch more than 500 results from the YT UI (would need a small tool checking that from source code after having scrolled manually), this issue shouldn't come from my reverse-engineering code.
If achieved that would help:
- https://stackoverflow.com/q/72438701
- 808arc#8280
- https://stackoverflow.com/q/77306424
Could complete this list.
Python script:
import requests, json
def get(url):
return requests.get(url).text
pageToken = ''
ids = []
while True:
url = 'https://yt.lemnoslife.com/search?part=id&q=hololive&type=video'
if pageToken != '':
url += '&pageToken=' + pageToken
content = get(url)
#print(content)
data = json.loads(content)
pageToken = data['nextPageToken']
items = data['items']
for item in items:
id = item['id']
if not id in ids:
ids += [id]
print(len(ids))
# reached 437 before KeyError: 'nextPageToken'
Related code:
https://github.com/Benjamin-Loison/YouTube-operational-API/blob/9b5a7805834fd56f12afc1fb55e439a68a5a787f/search.php#L105-L117
YouTube UI search by query term (Test here) when filtering for only retrieving videos stopped after 549 results (filtered ago as a whole word and filtering with view gives a similar result)...
When not filtering for only retrieving videos stopped after 654 results (filtered ago as a whole word).
Filtering with:
viewgives 702 matchesVIEW FULL PLAYLISTgives 2 matches, so can guess there are this number of playlistssubscribergives 44 matches, so can guess there are this number of channels
So this issue can't be solved easily AFAIK as the issue (limitation in fact) is on YouTube end.
Could give a try forcing YouTube Data API v3 Search: list endpoint by providing a modified page token after having reverse-engineered it as it doesn't contain randomness AFAIK.
This code snippet looks like what I am looking for.
This issue is quite similar to this Stack Overflow question.
Better understanding the pageToken may help to solve this question.
Similar issue with the Community tab: https://stackoverflow.com/questions/76699812/how-do-i-get-youtube-community-posts-older-than-200#comment135264020_76699812
#190 could help concerning the pagination token.
import requests
import json
pageToken = ''
ids = set()
url = 'https://yt.lemnoslife.com/noKey/search'
params = {
'q': 'test',
'type': 'video',
'maxResults': 50,
}
while True:
params['pageToken'] = pageToken
data = requests.get(url, params = params).json()
pageToken = data['nextPageToken']
items = data['items']
for item in items:
id_ = item['id']['videoId']
ids.add(id_)
print(len(ids))
# reached 518 before KeyError: 'nextPageToken'
import requests
import json
import blackboxprotobuf
import base64
typedef = {
'1': {
'type': 'int'
},
'2': {
'type': 'int'
}
}
pageToken = ''
ids = set()
url = 'https://yt.lemnoslife.com/noKey/search'
maxResults = 50
params = {
'q': 'test',
'type': 'video',
'maxResults': maxResults,
}
requestIndex = 0
while True:
message = {
'1': requestIndex * maxResults,
'2': 0,
}
data = blackboxprotobuf.encode_message(message, typedef)
pageToken = base64.b64encode(data).decode('utf-8')
params['pageToken'] = pageToken
print(pageToken)
data = requests.get(url, params = params).json()
items = data['items']
for item in items:
id_ = item['id']['videoId']
ids.add(id_)
print(len(ids))
requestIndex += 1
# reach and stuck to 510
Should test with YouTube UI pagination as well.
curl -s 'https://yt.lemnoslife.com/search?part=id&q=test&type=video' | jq .items[].id.videoId
curl -s 'https://yt.lemnoslife.com/search?part=id&q=test&type=video' | jq .nextPageToken
curl -s 'https://yt.lemnoslife.com/search?part=id&q=test&type=video' | jq -r .nextPageToken | base64 -d | protoc --decode_raw
2 {
2: "test"
3: "EgIQAUgUggELOUJ2eVkyX3c2RG-CAQtkYmpQblhhYWNBVYIBCzdjQ3BaS2ZkN1hBggELNWN5c1BQblpFaE2CAQtCREJ5aXZtclZ1TYIBCzJhNFV4ZHk5VFFZggELZzRReUp1MDlrdE2CAQtNNy1oM0ZPLUtLb4IBC0k1OEp5dEpFZmRzggELdTB3dVlZbnFkNzSCAQt5ck45Nm1nbkVsMIIBC0t3ZXZvY2FYZktnggELbUpWV1gwdnVkLWeCAQtlamFJTTNHcWVzd4IBCzczWUcwb2xOWFdvggELMU9fZURSOGZCUlGCAQt5aFM5TG5Eb29fd4IBC3ZlUGM1VjRoX2tnggELX1RYLS1Ga3U5TlGCAQtaeFlaa3oyMGxZQbIBBgoECBcQAuoBBAgCECg%3D"
}
3: 52047873
4: "search-feed"
When repeating the command, get an identical 3 but different 2/3.
2/3 is separated by - or CAQ or similar?
echo -n 'EgIQAUgUggELOUJ2eVkyX3c2RG' | base64 -d
H�
9BvyY2_w6Dbase64: invalid input
echo -n 'EgIQAUgUggELOUJ2eVkyX3c2RG=' | base64 -d
H�
9BvyY2_w6Dbase64: invalid input
echo -n 'EgIQAUgUggELOUJ2eVkyX3c2RG==' | base64 -d
H�
9BvyY2_w6D
echo -n 'EgIQAUgUggELOUJ2eVkyX3c2RG==' | base64 -d | protoc --decode_raw
Failed to parse input.
EgIQAUgUggELOUJ2eVkyX3c2RG-
CAQtkYmpQblhhYWNBVYIBCzdjQ3BaS2ZkN1hBggELNWN5c1BQblpFaE2
CAQtCREJ5aXZtclZ1TYIBCzJhNFV4ZHk5VFFZggELZzRReUp1MDlrdE2
CAQtNNy1oM0ZPLUtLb4IBC0k1OEp5dEpFZmRzggELdTB3dVlZbnFkNzS
CAQt5ck45Nm1nbkVsMIIBC0t3ZXZvY2FYZktnggELbUpWV1gwdnVkLWe
CAQtlamFJTTNHcWVzd4IBCzczWUcwb2xOWFdvggELMU9fZURSOGZCUlG
CAQt5aFM5TG5Eb29fd4IBC3ZlUGM1VjRoX2tnggELX1RYLS1Ga3U5TlG
CAQtaeFlaa3oyMGxZQbIBBgoECBcQAuoBBAgCECg%3D
curl -s "https://yt.lemnoslife.com/search?part=id&q=test&type=video&pageToken=`curl -s 'https://yt.lemnoslife.com/search?part=id&q=test&type=video' | jq -r .nextPageToken`" | jq .items[].id.videoId
curl -s "https://yt.lemnoslife.com/search?part=id&q=test&type=video&pageToken=`curl -s 'https://yt.lemnoslife.com/search?part=id&q=test&type=video' | jq -r .nextPageToken`" | jq .nextPageToken
null
this should not happen.
protoc --help
mkdir test/ && protoc test.proto --php_out test/
php a.php
PHP Fatal error: Uncaught Error: Class "GPBMetadata\A" not found in /home/benjamin/protobuf/message.php:34
Stack trace:
#0 /home/benjamin/protobuf/a.php(7): message->__construct()
#1 {main}
thrown in /home/benjamin/protobuf/message.php on line 34
Commenting \GPBMetadata\A::initOnce(); leads to:
PHP Fatal error: Uncaught InvalidArgumentException: message is not found in descriptor pool. Only generated classes may derive from Message. in /home/benjamin/protobuf/vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:74
Stack trace:
#0 /home/benjamin/protobuf/vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php(55): Google\Protobuf\Internal\Message->initWithGeneratedPool()
#1 /home/benjamin/protobuf/message.php(35): Google\Protobuf\Internal\Message->__construct()
#2 /home/benjamin/protobuf/a.php(7): message->__construct()
#3 {main}
thrown in /home/benjamin/protobuf/vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php on line 74