UltimaScraper KeyError: 'source'

Hi, I got the KeyError below. Is anyone know how to fix it? Thanks a lot.

poetry run python start_us.py
[2024-08-21 13:25:20] Assigning Jobs
Processing Scraped Posts
  0%|                                                                                             | 0/436 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/ubuntu/work/UltimaScraper/start_us.py", line 62, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/ubuntu/work/UltimaScraper/start_us.py", line 44, in main
    _api = await USR.start(
  File "/home/ubuntu/work/UltimaScraper/ultima_scraper/ultima_scraper.py", line 50, in start
    await self.start_datascraper(datascraper)
  File "/home/ubuntu/work/UltimaScraper/ultima_scraper/ultima_scraper.py", line 137, in start_datascraper
    await datascraper.datascraper.api.job_manager.process_jobs()
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/managers/job_manager/job_manager.py", line 45, in process_jobs
    await asyncio.create_task(self.__worker())
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/managers/job_manager/job_manager.py", line 53, in __worker
    await job.task
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/modules/module_streamliner.py", line 202, in prepare_scraper
    await self.process_scraped_content(
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/modules/module_streamliner.py", line 237, in process_scraped_content
    unrefined_set: list[dict[str, Any]] = await tqdm_asyncio.gather(
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in gather
    res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in <listcomp>
    res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
  File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
    return i, await f
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/datascraper_manager/datascrapers/onlyfans.py", line 51, in media_scraper
    content_metadata.resolve_extractor(Extractor(post_result))
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/metadata_manager/metadata_manager.py", line 216, in resolve_extractor
    self.medias: list[MediaMetadata] = result.get_medias(self)
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/metadata_manager/metadata_manager.py", line 147, in get_medias
    main_url = self.item.url_picker(asset_metadata)
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py", line 39, in url_picker
    source = media_item["source"]
KeyError: 'source'

Aug 21 '24 06:08 myps6415

Happens to me too. Worked 3-4 days ago, issue appeared suddenly without any visible reason. Also not model related, tried another one and the error appears there too.

Aug 21 '24 16:08 Neurosis404

+1

I pick "scrape all" and still, so I can confirm it has nothing to do with a model in specific, I think source just means OF in general.

Aug 21 '24 20:08 betoalanis

+1

Is this project event maintained anymore?

Aug 24 '24 06:08 gri1n

Haven't used this in awhile and when I do, I get the same error.

Aug 26 '24 04:08 felixtheant

Some investigation about general updates (because this codebase is old):

When looking at recent pypi package dependencies where the error happens with version 1.1.4 of ultima-scraper-api

and especially UltimaScraper ITSELF on pypi it seems that the latest UltimaScraper on pypi is newer than what is available in github.

I will investigate further but probably upgrading ultimascraper with last pypi sources will maybe or most likely fix this issue?

The codebase here is outdated with dependencies 2 years old but the pypi one using recent versions from this year from first view.

Interesting links with regular updated codebase (but not UltimaScraper itself somehow):

https://github.com/DATAHOARDERS
https://github.com/UltimaHoarder/UltimaScraperAPI

Aug 26 '24 23:08 barthramsay

@DIGITALCRIMINAL you mind either updating this repo or providing us a new updated start_us.py ?

Thank you

Aug 27 '24 01:08 barthramsay

It looks like the data structure that OnlyFans is using has changed. They removed the source key from the media, which was causing issues with getting the URLs. Now the source url is in files.full.url. I made some tweaks to the url_picker method in ultima_scraper_api/apis/onlyfans/__init__.py, now it works. Here’s the quick fix I did for the url_picker method:

    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

Sep 03 '24 06:09 UrsaBear

It looks like the data structure that OnlyFans is using has changed. They removed the source key from the media, which was causing issues with getting the URLs. Now the source url is in files.full.url. I made some tweaks to the url_picker method in ultima_scraper_api/apis/onlyfans/__init__.py, now it works. Here’s the quick fix I did for the url_picker method:
    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

I can confirm this is working, TYVM!!

UPDATE: I scrapped an account perfecly, and after that I'm getting a TypeError: argument of type 'NoneType' is not iterable error, so it's failing after one scrapped model after selecting "All", seems to be working correctly when selecting models 1 by 1

ANOTHER UPDATE: the script now seems to be working properly when selecting ALL, maybe some of my models db are corrupted, still testing, but overall this edit works :D

Sep 03 '24 22:09 betoalanis

Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (https://github.com/UltimaHoarder/UltimaScraper/issues/2121#issuecomment-2318619581)

in the same __init__.py file I replaced all the ["preview"] in preview_url_picker for ["full"]

Sep 04 '24 00:09 betoalanis

That got my downloads repaired as well, thanks everyone!

Sep 04 '24 20:09 cigix

I've tried to replicate the steps but cant make it work. Can anyone upload somewhere a working code version, please?

Sep 07 '24 18:09 raphaelbarreto

Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (#2121 (comment))

in the same __init__.py file I replaced all the ["preview"] in preview_url_picker for ["full"]

Hi everyone, I think this problem has been solved by everyone and it is worked for me now. I will make a summary here.

You need to fix __init__.py in folder ultima_scraper_api/apis/onlyfans. I think it's not easily to find out because you are in UltimaScraper this project. So, here I write down the full path: UltimaScraper/.venv/lib/python3.11/site-packages/ultima_scraper_api/apis/onlyfans, fix __init__.py here.

The corrected __init__.py is as follows:

from __future__ import annotations

from typing import TYPE_CHECKING, Any, Literal
from urllib.parse import urlparse

SubscriptionType = Literal["all", "active", "expired", "attention"]

if TYPE_CHECKING:
    from ultima_scraper_api.apis.onlyfans.classes.user_model import (
        AuthModel,
        create_user,
    )


class SiteContent:
    def __init__(self, option: dict[str, Any], user: AuthModel | create_user) -> None:
        self.id: int = option["id"]
        self.author = user
        self.media: list[dict[str, Any]] = option.get("media", [])
        self.preview_ids: list[int] = []
        self.__raw__ = option

    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

    def preview_url_picker(self, media_item: dict[str, Any]):
        preview_url = None
        if "files" in media_item:
            if (
                "preview" in media_item["files"]
                and "url" in media_item["files"]["full"]
            ):
                preview_url = media_item["files"]["full"]["url"]
        else:
            preview_url = media_item["full"]
            return urlparse(preview_url) if preview_url else None

    def get_author(self):
        return self.author

    async def refresh(self):
        func = await self.author.scrape_manager.handle_refresh(self)
        return await func(self.id)

Another thing is if you run this project by docker before, you need to rebuild your image and remember to put the fixed __init__.py in to right place. So I put my Dockerfile bellow:

FROM python:3.10-slim
RUN apt-get update && apt-get install -y \
  curl \
  libpq-dev \
  gcc \
  && rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/app
ENV POETRY_HOME=/usr/local/share/pypoetry
ENV POETRY_VIRTUALENVS_CREATE=false
RUN curl -sSL https://install.python-poetry.org | python3 -

COPY . .

RUN /usr/local/share/pypoetry/bin/poetry install --only main

COPY .venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py /usr/src/app/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py

CMD [ "/usr/local/share/pypoetry/bin/poetry", "run", "python", "./start_us.py" ]

After those settings, I think you can run it well. In my experience, after all settings, "KeyError: 'data'" appeared because new cookie needs to setting. You need to reset auth.json in __user_data__/profiles/OnlyFans/default/auth.json.

Sep 13 '24 04:09 myps6415

for reference the full path is C:\Users\{user}\AppData\Local\pypoetry\Cache\virtualenvs\ultima-scraper-UEi9_8Jc-py3.10\Lib\site-packages\ultima_scraper_api\apis\onlyfans @myps6415 correct me if im wrong but couldnt find the init file elsewhere

Dec 28 '24 15:12 passeee

Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (#2121 (comment)) in the same __init__.py file I replaced all the ["preview"] in preview_url_picker for ["full"]

Hi everyone, I think this problem has been solved by everyone and it is worked for me now. I will make a summary here.

You need to fix __init__.py in folder ultima_scraper_api/apis/onlyfans. I think it's not easily to find out because you are in UltimaScraper this project. So, here I write down the full path: UltimaScraper/.venv/lib/python3.11/site-packages/ultima_scraper_api/apis/onlyfans, fix __init__.py here.

The corrected __init__.py is as follows:

from __future__ import annotations

from typing import TYPE_CHECKING, Any, Literal
from urllib.parse import urlparse

SubscriptionType = Literal["all", "active", "expired", "attention"]

if TYPE_CHECKING:
    from ultima_scraper_api.apis.onlyfans.classes.user_model import (
        AuthModel,
        create_user,
    )


class SiteContent:
    def __init__(self, option: dict[str, Any], user: AuthModel | create_user) -> None:
        self.id: int = option["id"]
        self.author = user
        self.media: list[dict[str, Any]] = option.get("media", [])
        self.preview_ids: list[int] = []
        self.__raw__ = option

    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

    def preview_url_picker(self, media_item: dict[str, Any]):
        preview_url = None
        if "files" in media_item:
            if (
                "preview" in media_item["files"]
                and "url" in media_item["files"]["full"]
            ):
                preview_url = media_item["files"]["full"]["url"]
        else:
            preview_url = media_item["full"]
            return urlparse(preview_url) if preview_url else None

    def get_author(self):
        return self.author

    async def refresh(self):
        func = await self.author.scrape_manager.handle_refresh(self)
        return await func(self.id)

Another thing is if you run this project by docker before, you need to rebuild your image and remember to put the fixed __init__.py in to right place. So I put my Dockerfile bellow:

FROM python:3.10-slim
RUN apt-get update && apt-get install -y \
  curl \
  libpq-dev \
  gcc \
  && rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/app
ENV POETRY_HOME=/usr/local/share/pypoetry
ENV POETRY_VIRTUALENVS_CREATE=false
RUN curl -sSL https://install.python-poetry.org | python3 -

COPY . .

RUN /usr/local/share/pypoetry/bin/poetry install --only main

COPY .venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py /usr/src/app/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py

CMD [ "/usr/local/share/pypoetry/bin/poetry", "run", "python", "./start_us.py" ]

After those settings, I think you can run it well. In my experience, after all settings, "KeyError: 'data'" appeared because new cookie needs to setting. You need to reset auth.json in __user_data__/profiles/OnlyFans/default/auth.json.

Thanks! This worked for me.

One way to find where is the api, is using the command find:

find / -iname 'ultima_scraper_api'

Jan 01 '25 19:01 blackjax696

I updated my init.py file but am still having this issue. Maybe something changed again on the OF side? Is anybody else having issues?

Feb 19 '25 03:02 havingpun