KeyError: 'source'
Hi, I got the KeyError below. Is anyone know how to fix it? Thanks a lot.
poetry run python start_us.py
[2024-08-21 13:25:20] Assigning Jobs
Processing Scraped Posts
0%| | 0/436 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/ubuntu/work/UltimaScraper/start_us.py", line 62, in <module>
asyncio.run(main())
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/ubuntu/work/UltimaScraper/start_us.py", line 44, in main
_api = await USR.start(
File "/home/ubuntu/work/UltimaScraper/ultima_scraper/ultima_scraper.py", line 50, in start
await self.start_datascraper(datascraper)
File "/home/ubuntu/work/UltimaScraper/ultima_scraper/ultima_scraper.py", line 137, in start_datascraper
await datascraper.datascraper.api.job_manager.process_jobs()
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/managers/job_manager/job_manager.py", line 45, in process_jobs
await asyncio.create_task(self.__worker())
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/managers/job_manager/job_manager.py", line 53, in __worker
await job.task
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/modules/module_streamliner.py", line 202, in prepare_scraper
await self.process_scraped_content(
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/modules/module_streamliner.py", line 237, in process_scraped_content
unrefined_set: list[dict[str, Any]] = await tqdm_asyncio.gather(
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in gather
res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in <listcomp>
res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
return f.result() # May raise f.exception().
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
return i, await f
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/datascraper_manager/datascrapers/onlyfans.py", line 51, in media_scraper
content_metadata.resolve_extractor(Extractor(post_result))
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/metadata_manager/metadata_manager.py", line 216, in resolve_extractor
self.medias: list[MediaMetadata] = result.get_medias(self)
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/metadata_manager/metadata_manager.py", line 147, in get_medias
main_url = self.item.url_picker(asset_metadata)
File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py", line 39, in url_picker
source = media_item["source"]
KeyError: 'source'
Happens to me too. Worked 3-4 days ago, issue appeared suddenly without any visible reason. Also not model related, tried another one and the error appears there too.
+1
I pick "scrape all" and still, so I can confirm it has nothing to do with a model in specific, I think source just means OF in general.
+1
Is this project event maintained anymore?
Haven't used this in awhile and when I do, I get the same error.
Some investigation about general updates (because this codebase is old):
When looking at recent pypi package dependencies where the error happens with version 1.1.4 of ultima-scraper-api
and especially UltimaScraper ITSELF on pypi it seems that the latest UltimaScraper on pypi is newer than what is available in github.
I will investigate further but probably upgrading ultimascraper with last pypi sources will maybe or most likely fix this issue?
The codebase here is outdated with dependencies 2 years old but the pypi one using recent versions from this year from first view.
Interesting links with regular updated codebase (but not UltimaScraper itself somehow):
- https://github.com/DATAHOARDERS
- https://github.com/UltimaHoarder/UltimaScraperAPI
@DIGITALCRIMINAL you mind either updating this repo or providing us a new updated start_us.py ?
Thank you
It looks like the data structure that OnlyFans is using has changed.
They removed the source key from the media, which was causing issues with getting the URLs.
Now the source url is in files.full.url.
I made some tweaks to the url_picker method in ultima_scraper_api/apis/onlyfans/__init__.py, now it works.
Here’s the quick fix I did for the url_picker method:
def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
authed = self.get_author().get_authed()
video_quality = (
video_quality or self.author.get_api().get_site_settings().video_quality
)
if not media_item["canView"]:
return
source: dict[str, Any] = {}
media_type: str = ""
if "files" in media_item:
media_type = media_item["type"]
media_item = media_item["files"]
source = media_item["full"]
else:
return
url = source.get("url")
return urlparse(url) if url else None
It looks like the data structure that OnlyFans is using has changed. They removed the
sourcekey from themedia, which was causing issues with getting the URLs. Now the source url is infiles.full.url. I made some tweaks to theurl_pickermethod inultima_scraper_api/apis/onlyfans/__init__.py, now it works. Here’s the quick fix I did for the url_picker method:def url_picker(self, media_item: dict[str, Any], video_quality: str = ""): authed = self.get_author().get_authed() video_quality = ( video_quality or self.author.get_api().get_site_settings().video_quality ) if not media_item["canView"]: return source: dict[str, Any] = {} media_type: str = "" if "files" in media_item: media_type = media_item["type"] media_item = media_item["files"] source = media_item["full"] else: return url = source.get("url") return urlparse(url) if url else None
I can confirm this is working, TYVM!!
UPDATE: I scrapped an account perfecly, and after that I'm getting a TypeError: argument of type 'NoneType' is not iterable error, so it's failing after one scrapped model after selecting "All", seems to be working correctly when selecting models 1 by 1
ANOTHER UPDATE: the script now seems to be working properly when selecting ALL, maybe some of my models db are corrupted, still testing, but overall this edit works :D
Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (https://github.com/UltimaHoarder/UltimaScraper/issues/2121#issuecomment-2318619581)
in the same __init__.py file I replaced all the ["preview"] in preview_url_picker for ["full"]
That got my downloads repaired as well, thanks everyone!
I've tried to replicate the steps but cant make it work. Can anyone upload somewhere a working code version, please?
Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (#2121 (comment))
in the same
__init__.pyfile I replaced all the["preview"]inpreview_url_pickerfor["full"]
Hi everyone, I think this problem has been solved by everyone and it is worked for me now. I will make a summary here.
You need to fix __init__.py in folder ultima_scraper_api/apis/onlyfans. I think it's not easily to find out because you are in UltimaScraper this project. So, here I write down the full path: UltimaScraper/.venv/lib/python3.11/site-packages/ultima_scraper_api/apis/onlyfans, fix __init__.py here.
The corrected __init__.py is as follows:
from __future__ import annotations
from typing import TYPE_CHECKING, Any, Literal
from urllib.parse import urlparse
SubscriptionType = Literal["all", "active", "expired", "attention"]
if TYPE_CHECKING:
from ultima_scraper_api.apis.onlyfans.classes.user_model import (
AuthModel,
create_user,
)
class SiteContent:
def __init__(self, option: dict[str, Any], user: AuthModel | create_user) -> None:
self.id: int = option["id"]
self.author = user
self.media: list[dict[str, Any]] = option.get("media", [])
self.preview_ids: list[int] = []
self.__raw__ = option
def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
authed = self.get_author().get_authed()
video_quality = (
video_quality or self.author.get_api().get_site_settings().video_quality
)
if not media_item["canView"]:
return
source: dict[str, Any] = {}
media_type: str = ""
if "files" in media_item:
media_type = media_item["type"]
media_item = media_item["files"]
source = media_item["full"]
else:
return
url = source.get("url")
return urlparse(url) if url else None
def preview_url_picker(self, media_item: dict[str, Any]):
preview_url = None
if "files" in media_item:
if (
"preview" in media_item["files"]
and "url" in media_item["files"]["full"]
):
preview_url = media_item["files"]["full"]["url"]
else:
preview_url = media_item["full"]
return urlparse(preview_url) if preview_url else None
def get_author(self):
return self.author
async def refresh(self):
func = await self.author.scrape_manager.handle_refresh(self)
return await func(self.id)
Another thing is if you run this project by docker before, you need to rebuild your image and remember to put the fixed __init__.py in to right place. So I put my Dockerfile bellow:
FROM python:3.10-slim
RUN apt-get update && apt-get install -y \
curl \
libpq-dev \
gcc \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/app
ENV POETRY_HOME=/usr/local/share/pypoetry
ENV POETRY_VIRTUALENVS_CREATE=false
RUN curl -sSL https://install.python-poetry.org | python3 -
COPY . .
RUN /usr/local/share/pypoetry/bin/poetry install --only main
COPY .venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py /usr/src/app/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py
CMD [ "/usr/local/share/pypoetry/bin/poetry", "run", "python", "./start_us.py" ]
After those settings, I think you can run it well.
In my experience, after all settings, "KeyError: 'data'" appeared because new cookie needs to setting. You need to reset auth.json in __user_data__/profiles/OnlyFans/default/auth.json.
for reference the full path is C:\Users\{user}\AppData\Local\pypoetry\Cache\virtualenvs\ultima-scraper-UEi9_8Jc-py3.10\Lib\site-packages\ultima_scraper_api\apis\onlyfans
@myps6415 correct me if im wrong but couldnt find the init file elsewhere
Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (#2121 (comment)) in the same
__init__.pyfile I replaced all the["preview"]inpreview_url_pickerfor["full"]Hi everyone, I think this problem has been solved by everyone and it is worked for me now. I will make a summary here.
You need to fix
__init__.pyin folderultima_scraper_api/apis/onlyfans. I think it's not easily to find out because you are inUltimaScraperthis project. So, here I write down the full path:UltimaScraper/.venv/lib/python3.11/site-packages/ultima_scraper_api/apis/onlyfans, fix__init__.pyhere.The corrected
__init__.pyis as follows:from __future__ import annotations from typing import TYPE_CHECKING, Any, Literal from urllib.parse import urlparse SubscriptionType = Literal["all", "active", "expired", "attention"] if TYPE_CHECKING: from ultima_scraper_api.apis.onlyfans.classes.user_model import ( AuthModel, create_user, ) class SiteContent: def __init__(self, option: dict[str, Any], user: AuthModel | create_user) -> None: self.id: int = option["id"] self.author = user self.media: list[dict[str, Any]] = option.get("media", []) self.preview_ids: list[int] = [] self.__raw__ = option def url_picker(self, media_item: dict[str, Any], video_quality: str = ""): authed = self.get_author().get_authed() video_quality = ( video_quality or self.author.get_api().get_site_settings().video_quality ) if not media_item["canView"]: return source: dict[str, Any] = {} media_type: str = "" if "files" in media_item: media_type = media_item["type"] media_item = media_item["files"] source = media_item["full"] else: return url = source.get("url") return urlparse(url) if url else None def preview_url_picker(self, media_item: dict[str, Any]): preview_url = None if "files" in media_item: if ( "preview" in media_item["files"] and "url" in media_item["files"]["full"] ): preview_url = media_item["files"]["full"]["url"] else: preview_url = media_item["full"] return urlparse(preview_url) if preview_url else None def get_author(self): return self.author async def refresh(self): func = await self.author.scrape_manager.handle_refresh(self) return await func(self.id)Another thing is if you run this project by docker before, you need to rebuild your image and remember to put the fixed
__init__.pyin to right place. So I put my Dockerfile bellow:FROM python:3.10-slim RUN apt-get update && apt-get install -y \ curl \ libpq-dev \ gcc \ && rm -rf /var/lib/apt/lists/* WORKDIR /usr/src/app ENV POETRY_HOME=/usr/local/share/pypoetry ENV POETRY_VIRTUALENVS_CREATE=false RUN curl -sSL https://install.python-poetry.org | python3 - COPY . . RUN /usr/local/share/pypoetry/bin/poetry install --only main COPY .venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py /usr/src/app/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py CMD [ "/usr/local/share/pypoetry/bin/poetry", "run", "python", "./start_us.py" ]After those settings, I think you can run it well. In my experience, after all settings, "KeyError: 'data'" appeared because new cookie needs to setting. You need to reset
auth.jsonin__user_data__/profiles/OnlyFans/default/auth.json.
Thanks! This worked for me.
One way to find where is the api, is using the command find:
find / -iname 'ultima_scraper_api'
I updated my init.py file but am still having this issue. Maybe something changed again on the OF side? Is anybody else having issues?