bulk-downloader-for-reddit icon indicating copy to clipboard operation
bulk-downloader-for-reddit copied to clipboard

[BUG] error with 7-digit id files

Open slormo opened this issue 1 year ago • 7 comments

  • [x] I am reporting a bug.
  • [x] I am running the latest version of BDfR
  • [x] I have read the Opening an issue

Description

I'm downloading submissions with the --include-id-file parameter. However the scraper crashes when encountering an id with 7-digits. an id such as 10cet41 (nsfw) in a text file can cause this error.

Logs

[2023-05-01 23:49:58,945 - bdfr.connector - DEBUG] - Disabling the following modules: 
[2023-05-01 23:49:58,945 - bdfr.connector - Level 9] - Created download filter
[2023-05-01 23:49:58,946 - bdfr.connector - Level 9] - Created time filter
[2023-05-01 23:49:58,946 - bdfr.connector - Level 9] - Created sort filter
[2023-05-01 23:49:58,957 - bdfr.connector - Level 9] - Create file name formatter
[2023-05-01 23:49:58,957 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2023-05-01 23:49:58,958 - bdfr.oauth2 - Level 9] - Loaded OAuth2 token for authoriser
[2023-05-01 23:49:59,171 - bdfr.oauth2 - Level 9] - Written OAuth2 token from authoriser to C:\Users\****\AppData\Local\BDFR\bdfr\default_config.cfg
[2023-05-01 23:49:59,433 - bdfr.connector - Level 9] - Resolved user to ****
[2023-05-01 23:49:59,433 - bdfr.connector - Level 9] - Created site authenticator
[2023-05-01 23:49:59,434 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-05-01 23:49:59,434 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-05-01 23:49:59,434 - bdfr.connector - Level 9] - Retrieved user data
[2023-05-01 23:49:59,434 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-05-01 23:49:59,586 - root - ERROR] - Scraper exited unexpectedly
Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\__main__.py", line 161, in cli_clone
    reddit_scraper.download()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\cloner.py", line 26, in download
    self._download_submission(submission)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\bdfr\downloader.py", line 62, in _download_submission
    elif submission.subreddit.display_name.lower() in self.args.skip_subreddit:
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\base.py", line 34, in __getattr__
    self._fetch()
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\praw\models\reddit\comment.py", line 195, in _fetch
    raise ClientException(f"No data returned for comment {self.fullname}")
praw.exceptions.ClientException: No data returned for comment t1_10cet41

slormo avatar May 01 '23 22:05 slormo

Did you find any solution to this?

DnanaDev avatar May 09 '23 10:05 DnanaDev

Did you find any solution to this?

Nah I haven't found any workarounds for it yet.

slormo avatar May 09 '23 17:05 slormo

I already reported this issue 6 weeks ago in #839 , so far there have been no updates.

Fakeaccount12312 avatar Jun 11 '23 14:06 Fakeaccount12312

Apologies, I'm in the last week of my semester and have exams this week. Once they're done I'll have more time to close a bunch of issues and make a new release.

Serene-Arc avatar Jun 12 '23 00:06 Serene-Arc

The issue is in https://github.com/aliparlakci/bulk-downloader-for-reddit/blob/8c293a46843c818bea2c2013db38191867993a14/bdfr/archiver.py#L62 where 7-digit ids cause the archiver to create a PRAW comment instead of submission

I don't know how to fix it cleanly, but a workaround is to replace line 61 with this:

if len(sub_id) in (6, 7):

Which will capture those 7-digit ids too.

kvangork avatar Jun 14 '23 22:06 kvangork

The whole thing is just roughly made and doesn't work in many cases. Both post and comment IDs on Reddit are incrementing continuously and can have any number of digits, they are just both at 7 digits now. But there are Reddit posts with a two-digit id. Not sure how to solve than one but the current solution is already not clean at all. This is not just an issue with 7-digit-IDs.

Your fix treats every 6-7 digit id as post, that is ok for me since I don't download comments from an ID file, but this really needs a rework.

For now, my fix is to just treat everything as a post, I replaced line 60 with if re.match(r"^\w{2,7}$", sub_id): and deleted lines 62-63 completely. Not sure how you would even detect comments.

Edit: I was very right. For example the ID c9h18 refers to both a post and a comment.

Fakeaccount12312 avatar Oct 11 '23 22:10 Fakeaccount12312

If you have any suggestions, then please do elaborate. If you have knowledge of those 2 digit IDs, please provide them. The first step is getting actual IDs that we can test and find. Once we have those, we can write the tests to proof the logic.

Serene-Arc avatar Oct 13 '23 02:10 Serene-Arc