OF-Scraper icon indicating copy to clipboard operation
OF-Scraper copied to clipboard

Post User Process not running consistently

Open Jakan-Kink opened this issue 1 year ago • 15 comments

Describe the bug

Using both version 3.11.1 and 3.11.2 with the new post* scripts I am running into two issues. First it isn't pulling in the scripts from the config file. I have to add extra code.

In get_post_download_script: https://github.com/datawhores/OF-Scraper/blob/625be8e5a33a4b854d9fc1e494e0204a9e8cd180/ofscraper/utils/config/data.py#L270-L280

I need to add:

    elif config.get("scripts", {}).get("post_download_script") is not None:
        val = config.get("scripts", {}).get("post_download_script")

and in get_post_script: https://github.com/datawhores/OF-Scraper/blob/625be8e5a33a4b854d9fc1e494e0204a9e8cd180/ofscraper/utils/config/data.py#L284-L294

I need to add:

    elif config.get("scripts", {}).get("post_script") is not None:
        val = config.get("scripts", {}).get("post_script")

But even then, the script doesn't run in 3.11.2

To Reproduce

Run ofscraper -u ALL -l DEBUG -p STATS -o all,labels -a download -d 120 -ts -up -st expired

Expected behavior

After every user the post_download_script command should fire, and at the end of the loop the post_script should fire.

Screenshots/Logs

With 3.11.2 The error I get for every performer is:

 2024-08-08 18:25:50:[level.inner:11]  expected str, bytes or os.PathLike object, not int
 2024-08-08 18:25:50:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_user.py", line 18, in post_user_process
    run(
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/system/subprocess.py", line 9, in run
    t=subprocess.run(*args, stdout=subprocess.PIPE,
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1885, in _execute_child
    self.pid = _fork_exec(
               ^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not int

Config

{
    "main_profile": "main_profile",
    "metadata": "{save_location}/meta/OnlyFans/{model_username}/Metadata",
    "discord": "",
    "file_options": {
        "save_location": "/Volumes/FileAccess/OnlyFans/",
        "dir_format": "sites/OnlyFans/{model_username}/{responsetype}/{value}/{mediatype}/",
        "file_format": "{date}-{filename}.{ext}",
        "textlength": 0,
        "space_replacer": " ",
        "date": "YYYY-MM-DD_HH-mm",
        "text_type_default": "letter",
        "truncation_default": true
    },
    "download_options": {
        "filter": [
            "Images",
            "Audios",
            "Videos",
            "Text"
        ],
        "auto_resume": false,
        "system_free_min": 0,
        "max_post_count": 0
    },
    "binary_options": {
        "ffmpeg": "/opt/homebrew/bin/ffmpeg"
    },
    "cdm_options": {
        "private-key": null,
        "client-id": null,
        "key-mode-default": "keydb",
        "keydb_api": "{redacted}"
    },
    "performance_options": {
        "download_sems": 6,
        "thread_count": 2,
        "download_limit": 0
    },
    "content_filter_options": {
        "block_ads": false,
        "file_size_max": 0,
        "file_size_min": 0,
        "length_max": null,
        "length_min": null
    },
    "advanced_options": {
        "code-execution": true,
        "dynamic-mode-default": "datawhores",
        "backend": "aio",
        "downloadbars": true,
        "cache-mode": "json",
        "appendlog": false,
        "custom_values": {
            "OLD_DEVIINT": "https://raw.githubusercontent.com/datawhores/onlyfans-dynamic-rules/new/dynamicRules.json",
            "XAGLER": "https://raw.githubusercontent.com/xagler/dynamic-rules/main/onlyfans.json",
            "RAFA": "https://raw.githubusercontent.com/rafa-9/dynamic-rules/main/rules.json",
            "DIGITALCRIMINALS": "https://raw.githubusercontent.com/DATAHOARDERS/dynamic-rules/main/onlyfans.json",
            "DATAWHORES": "https://raw.githubusercontent.com/datawhores/onlyfans-dynamic-rules/main/dynamicRules.json",
            "DEVIINT": "https://raw.githubusercontent.com/rafa-9/dynamic-rules/main/rules.json",
            "MAXFILE_SEMAPHORE": 10,
            "SHOW_AVATAR": false,
            "import": "exec('import ofscraper.filters.models.selector as selector23')",
            "list": "exec('modelObjs=C)')",
            "model_price": "'fallback' if len(modelObjs)==0 else 'Paid' if modelObjs[0].final_current_price>0 else 'Free'"
        },
        "sanitize_text": false,
        "temp_dir": null,
        "remove_hash_match": true,
        "infinite_loop_action_mode": false,
        "enable_auto_after": true,
        "default_user_list": "main",
        "default_black_list": ""
    },
    "scripts": {
        "post_download_script": "/Users/your_username/Development/of-scraper-post/post-user.sh",
        "post_script": "/Users/your_username/Development/of-scraper-post/post-loop.sh"
    },
    "responsetype": {
        "timeline": "Posts",
        "message": "Messages",
        "archived": "Archived",
        "paid": "Messages",
        "stories": "Stories",
        "highlights": "Stories",
        "profile": "Profile",
        "pinned": "Posts",
        "streams": "Streams"
    },
    "overwrites": {
        "audios": {},
        "videos": {},
        "images": {},
        "text": {
            "file_format": "{date}-{post_id}.{ext}"
        }
    }
}

System Info

  • OS: macOS 14.5 (M1)
  • pipx
  • python 3.12

Additional context

This happens on multiple OF accounts; here are some examples: couple_of_perverts, lilithinlatexxx, rubberdoll, lola-saint, sophie_x_elodie, trainingj, tightlacedchaos, doe-eyes-official

Jakan-Kink avatar Aug 08 '24 22:08 Jakan-Kink

I think it is because model_id needs to be converted into a string if not already one

datawhores avatar Aug 09 '24 00:08 datawhores

I forced model_id to string in final_user.py:

        run(
            [
                settings.get_post_download_script(),
                username,
                str(model_id),
                json.dumps(media_dump),
                json.dumps(post_dump),
                json.dumps(master_dump),
            ]
        )

and it did change the error message

 2024-08-08 20:47:57:[final_user.post_user_process:13]  Running post script for lilithinlatexxx
 2024-08-08 20:47:58:[level.inner:11]  [Errno 7] Argument list too long: '/Users/your_username/Development/of-scraper-post/post-user.sh'
 2024-08-08 20:47:58:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_user.py", line 24, in post_user_process
    run(
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/system/subprocess.py", line 9, in run
    t=subprocess.run(*args, stdout=subprocess.PIPE,
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/Users/your_username/Development/of-scraper-post/post-user.sh'

Jakan-Kink avatar Aug 09 '24 00:08 Jakan-Kink

just to make sure

getconf ARG_MAX
1048576

Jakan-Kink avatar Aug 09 '24 01:08 Jakan-Kink

Yeah I've never had to worry about this

datawhores avatar Aug 09 '24 03:08 datawhores

I have a partial solution but some information is still too long

datawhores avatar Aug 09 '24 03:08 datawhores

also, I hadn't gotten the post script to actually fire off earlier, just came back to this waiting:

 2024-08-09 01:27:34:[final_script.final_script:27]  Running post script
 2024-08-09 01:27:34:[level.inner:11]  Object of type Model is not JSON serializable
 2024-08-09 01:27:34:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/run.py", line 88, in daemon_run_helper
    job_func()
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/context/exit.py", line 92, in inner
    raise E
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/context/exit.py", line 85, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/commands/managers/scraper.py", line 47, in runner
    final(normal_data , scrape_paid_data ,user_first_data,userdata)
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final.py", line 17, in final
    final_script(users or [])
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_script.py", line 42, in final_script
    json.dumps(out_dict)
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Model is not JSON serializable

Jakan-Kink avatar Aug 09 '24 05:08 Jakan-Kink

Found what to blame: https://github.com/datawhores/OF-Scraper/blob/625be8e5a33a4b854d9fc1e494e0204a9e8cd180/ofscraper/runner/close/final/final_script.py#L29-L38

you create a variable data, and append the Model.model for each ele, but then pass users to the out_dict instead of data

Jakan-Kink avatar Aug 09 '24 06:08 Jakan-Kink

Yeah that only works for that one since the amount of data is small

for the download script my other solution won't work for larger creators

The user will have to read and process the data in there script

I think the only possibility is to redirect the data with > then the user would have to read the input_

Update: I think the solution is to write a single json to a temporary file, pass that path off to the script

datawhores avatar Aug 09 '24 08:08 datawhores

I will fix the post_script

for the post_download_script I made this change

        master_dump=json.dumps({"username":username,"model_id":model_id,"media":media,"posts":posts})
        with tempfile.NamedTemporaryFile() as f:
          with open(f.name, "w") as g:
              g.write(master_dump)
          run([settings.get_post_download_script(),f.name])

I think the post_script will be okay, but just to be safe and to put things in sync I think I will do the same for that as as well

datawhores avatar Aug 09 '24 12:08 datawhores

So far it been working on my system I've been testing with --post-script cat and --download-script cat to make sure the output is shown on the console

Tested in

  • manual mode
  • check mode
  • normal downloading

datawhores avatar Aug 09 '24 17:08 datawhores

looks like in some of the work between 3.11.2 and 3.11.6 there seems to have been a change in final_script.py that caused a crash:

 2024-08-21 22:49:16:[final_script.final_script:31]  Running post script
 2024-08-21 22:49:16:[level.inner:11]  unhashable type: 'dict'
 2024-08-21 22:49:16:[level.inner:11]  Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/your_username/utils/run.py", line 88, in daemon_run_helper
    job_func()
  File "/venv/lib/python3.11/site-packages/your_username/utils/context/exit.py", line 92, in inner
    raise E
  File "/venv/lib/python3.11/site-packages/your_username/utils/context/exit.py", line 85, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/your_username/commands/managers/scraper.py", line 50, in runner
    final(normal_data, scrape_paid_data, user_first_data, userdata)
  File "/venv/lib/python3.11/site-packages/your_username/runner/close/final/final.py", line 20, in final
    final_script(userdata or [])
  File "/venv/lib/python3.11/site-packages/your_username/runner/close/final/final_script.py", line 36, in final_script
    data = value
    ~~~~^^^^^
TypeError: unhashable type: 'dict'

Jakan-Kink avatar Aug 22 '24 03:08 Jakan-Kink

looks like in some of the work between 3.11.2 and 3.11.6 there seems to have been a change in final_script.py that caused a crash:

+1 on this, I'm seeing the same issue on 3.11.6

dunngitter avatar Aug 23 '24 18:08 dunngitter

should be fixed

datawhores avatar Aug 23 '24 19:08 datawhores

In which release? 3.11.7? Could you please generate the package for that version if so? I can't pull the docker image right now

dunngitter avatar Aug 24 '24 18:08 dunngitter

Its not in a release, it is in commit ce82515


From: dunngitter @.> Sent: Saturday, August 24, 2024 14:13 To: datawhores/OF-Scraper @.> Cc: Jakan @.>; Author @.> Subject: Re: [datawhores/OF-Scraper] Post User Process not running consistently (Issue #444)

In which release? 3.11.7? Could you please generate the package for that version if so? I can't pull the docker image right now

— Reply to this email directly, view it on GitHubhttps://github.com/datawhores/OF-Scraper/issues/444#issuecomment-2308480571, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BKDQMJSDELPLMBBXIPYBGULZTDELJAVCNFSM6AAAAABMHMG6UOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGQ4DANJXGE. You are receiving this because you authored the thread.Message ID: @.***>

Jakan-Kink avatar Aug 24 '24 18:08 Jakan-Kink