bulk-downloader-for-reddit icon indicating copy to clipboard operation
bulk-downloader-for-reddit copied to clipboard

[BUG] deleted comment causes crash: comment does not appear to be in the comment tree

Open eitau opened this issue 1 year ago • 1 comments

  • [X] I am reporting a bug.
  • [X] I am running the latest version of BDfR
  • [X] I have read the Opening an issue

Description

When a comment was deleted is to be archived (such as comment I saved some time ago), bdfr crashes. Here's id-file I eventually used (all entries except the second one are ids of valid comments and submissions):

f86juax
f82o5e7
dz7sc5
e2qia6
e2xwjv
e0z05x
e112bz
e1881c
duvvig
f8bsmze
e06xno
e068tu
dzkvb8

(I can't unsave f82o5e7 using neither website not app as an attempt to resume archiving rest of my saved items.)

Command

bdfr archive ./2023-03-06/ --include-id-file id-file

Environment (please complete the following information)

  • OS: OS: NixOS (unstable)
  • Python version: Python 3.10.10

Logs

[2023-03-06 15:05:19,040 - bdfr.connector - DEBUG] - Disabling the following modules:
[2023-03-06 15:05:19,041 - bdfr.connector - Level 9] - Created download filter
[2023-03-06 15:05:19,041 - bdfr.connector - Level 9] - Created time filter
[2023-03-06 15:05:19,041 - bdfr.connector - Level 9] - Created sort filter
[2023-03-06 15:05:19,042 - bdfr.connector - Level 9] - Create file name formatter
[2023-03-06 15:05:19,042 - bdfr.connector - DEBUG] - Using unauthenticated Reddit instance
[2023-03-06 15:05:19,043 - bdfr.connector - Level 9] - Created site authenticator
[2023-03-06 15:05:19,043 - bdfr.connector - Level 9] - Retrieved subreddits
[2023-03-06 15:05:19,043 - bdfr.connector - Level 9] - Retrieved multireddits
[2023-03-06 15:05:19,043 - bdfr.connector - Level 9] - Retrieved user data
[2023-03-06 15:05:19,043 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2023-03-06 15:05:19,972 - bdfr.archiver - DEBUG] - Attempting to archive submission duvvig
[2023-03-06 15:05:19,973 - bdfr.archive_entry.submission_archive_entry - DEBUG] - Retrieving full comment tree for submission duvvig
[2023-03-06 15:05:19,982 - bdfr.archiver - DEBUG] - Writing entry duvvig to file in JSON format at /tmp/bdfr/not-in-comment-tree/2023-03-06/NixOS/tim-hilt_Usability of NixOS as a daily driver_duvvig.json
[2023-03-06 15:05:19,982 - bdfr.archiver - INFO] - Record for entry item duvvig written to disk
[2023-03-06 15:05:20,195 - bdfr.archiver - DEBUG] - Attempting to archive submission f86juax
[2023-03-06 15:05:21,474 - bdfr.archiver - DEBUG] - Writing entry f86juax to file in JSON format at /tmp/bdfr/not-in-comment-tree/2023-03-06/selfhosted/choketube_What are you using to check the status of your services and notify you of issues?_f86juax.json
[2023-03-06 15:05:21,474 - bdfr.archiver - INFO] - Record for entry item f86juax written to disk
[2023-03-06 15:05:21,673 - bdfr.archiver - DEBUG] - Attempting to archive submission f82o5e7
[2023-03-06 15:05:21,910 - root - ERROR] - Archiver exited unexpectedly
Traceback (most recent call last):
  File "/nix/store/gcj0r2gn5cw2dwx4q5mwvhqgsh93fjk1-bdfr-2.6.2/lib/python3.10/site-packages/bdfr/__main__.py", line 139, in cli_archive
    reddit_archiver.download()
  File "/nix/store/gcj0r2gn5cw2dwx4q5mwvhqgsh93fjk1-bdfr-2.6.2/lib/python3.10/site-packages/bdfr/archiver.py", line 49, in download
    self.write_entry(submission)
  File "/nix/store/gcj0r2gn5cw2dwx4q5mwvhqgsh93fjk1-bdfr-2.6.2/lib/python3.10/site-packages/bdfr/archiver.py", line 92, in write_entry
    self._write_entry_json(archive_entry)
  File "/nix/store/gcj0r2gn5cw2dwx4q5mwvhqgsh93fjk1-bdfr-2.6.2/lib/python3.10/site-packages/bdfr/archiver.py", line 103, in _write_entry_json
    content = json.dumps(entry.compile())
  File "/nix/store/gcj0r2gn5cw2dwx4q5mwvhqgsh93fjk1-bdfr-2.6.2/lib/python3.10/site-packages/bdfr/archive_entry/comment_archive_entry.py", line 18, in compile
    self.source.refresh()
  File "/nix/store/1vg4vp3ynvzbq4p4js5qpmx6qj3mchnz-python3.10-praw-7.6.1/lib/python3.10/site-packages/praw/models/reddit/comment.py", line 298, in refresh
    raise ClientException(self.MISSING_COMMENT_MESSAGE)
praw.exceptions.ClientException: This comment does not appear to be in the comment tree

eitau avatar Mar 06 '23 14:03 eitau

Had the same problem, and made BDFR ignore the exception by modifying the bdfr\archiver.py file, function write_entry:

    def write_entry(self, praw_item: Union[praw.models.Submission, praw.models.Comment]):
        try:
            if self.args.comment_context and isinstance(praw_item, praw.models.Comment):
                logger.debug(f"Converting comment {praw_item.id} to submission {praw_item.submission.id}")
                praw_item = praw_item.submission
            archive_entry = self._pull_lever_entry_factory(praw_item)
            if self.args.format == "json":
                self._write_entry_json(archive_entry)
            elif self.args.format == "xml":
                self._write_entry_xml(archive_entry)
            elif self.args.format == "yaml":
                self._write_entry_yaml(archive_entry)
            else:
                raise ArchiverError(f"Unknown format {self.args.format} given")
        except ClientException as ex:
            logger.warning(f"Skipping item {praw_item.id} due to exception {ex}")
        logger.info(f"Record for entry item {praw_item.id} written to disk")

As you can see, it skips over and doesn't write the entry if the ClientException occurs.

Don't know if that would be a good way to resolve it for general-purpose use, but it works for my case.

ValidAQ avatar Jun 11 '23 14:06 ValidAQ