lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

[Bug]: Comment pagination is broken when `max_depth=N` is included

Open christianjuth opened this issue 6 months ago • 12 comments

Requirements

  • [x] Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support or the matrix chat.
  • [x] Did you check to see if this issue already exists?
  • [x] Is this only a single bug? Do not put multiple bugs in one issue.
  • [x] Do you agree to follow the rules in our Code of Conduct?
  • [x] Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

Comment pagination is broken on Lemmy BE 0.19.12. It's been broken for as long as I can remember, but I'm currently testing on BE 0.19.12. It seem to break as soon as you set max_depth=N to any value

If I request comments like this:

  • https://lemmy.ml/api/v3/comment/list?sort=Hot&post_id=32971772&type_=All&limit=50&max_depth=6&page=1
  • https://lemmy.ml/api/v3/comment/list?sort=Hot&post_id=32971772&type_=All&limit=50&max_depth=6&page=2

Page 1 and 2+ will contain the same comments. Not only does this mean page 2 can never be fetched, but it also means clients need to add a safeguard to prevent pagination libraries form paging out to infinity.

Steps to Reproduce

If you want, you can use the following JavaScript function to compare page 1 and page 2

function compareCommentResponses(res1, res2) {
    const hashRes = (res) => {
        return res.comments.map(item => item.comment.id).join("-")
    }
    return hashRes(res1) === hashRes(res2)
}
compareCommentResponses(page1Response, page2Response)

Example 1

  1. GET https://lemmy.ml/api/v3/comment/list?sort=Hot&post_id=32971772&type_=All&limit=50&max_depth=6&page=1
  2. GET https://lemmy.ml/api/v3/comment/list?sort=Hot&post_id=32971772&type_=All&limit=50&max_depth=6&page=2
  3. Compare the results. They should be identical.
  4. Exclude max_depth and pagination in the above requests start working

Example 2

  1. GET https://lemmy.ml/api/v3/comment/list?sort=Hot&post_id=1223478&type_=Local&limit=50&max_depth=6&page=1
  2. `GET https://lemmy.ml/api/v3/comment/list?sort=Hot&post_id=1223478&type_=Local&limit=50&max_depth=6&page=2
  3. Exclude max_depth and pagination in the above requests start working

Technical Details

I'm consuming the API, not hosting myself. No browser or network request errors.

Version

BE 0.19.12

Lemmy Instance URL

lemmy.ml

christianjuth avatar Jul 12 '25 17:07 christianjuth

Page and limit are ignored when max_depth (IE tree fetching) is used.

There's a current kludge in the code of only showing 300 max top level comments (to prevent attacks), and limiting / paging doesn't work (or at least I haven't found the solution) for comment trees due to a limitation with postgres ltrees: https://stackoverflow.com/questions/72983614/postgres-ltree-how-to-limit-the-max-number-of-children-at-any-given-level

dessalines avatar Jul 13 '25 01:07 dessalines

@dessalines I'm exploring alternative solutions.

I had this idea:

function hasMissingComments(comments: CommentView[]) {
  // Build a set of all comment IDs (as strings for easy comparison)
  const ids = new Set(comments.map(({comment}) => String(comment.id)));

  // Check each comment’s path for a missing ancestor
  return comments.some(({ comment }) => {
    const path = comment.path;

    // e.g. "0.123.456.789" → ["0","123","456","789"]
    const parts = path.split(".");

    // Skip the first segment ("0") and the last (the comment itself),
    // and see if any of the intermediate IDs are missing.
    for (let i = 1; i < parts.length - 1; i++) {
      const part = parts[i];
      if (part && !ids.has(part)) {
        return true;
      }
    }
    return false;
  });
}

And then I basically keep fetching the next page until hasMissingComments(allPagedComments) === false. But I ran one test and on a post with 6k comment that will slam the API until I get rate limited.

Another idea I had was to combine two requests one with max_depth and one without. That way the first request fills in all the responses but the second is able to page. But there is no guarantee that won't also miss comments.

I'm not sure how to proceed from here. It seems the other Lemmy clients (e.g. Voyager, Interstellar) have resorted to just not paging comments and only showing whatever gets returned on page 1. I suppose most Lemmy posts won't see this issue if they have <=300 comments, but it's still a shame there isn't a way to solve this.

christianjuth avatar Jul 13 '25 18:07 christianjuth

The default rate limit for read actions is 180 requests per 60s. This means if you sleep for 333ms after each request, you will never get any rate limit error.

Nutomic avatar Jul 14 '25 12:07 Nutomic

Another idea I had was to combine two requests one with max_depth and one without. That way the first request fills in all the responses but the second is able to page. But there is no guarantee that won't also miss comments.

This IMO is the best hack, but yes it still doesn't solve the root problem, and there's no guarantee the next results will contain all the branches.

I think most clients haven't had too much of an issue with not paging comments, because the hot sort for comments ensures that newer and popular comments get seen and voted on anyway, so any branches (or top level) will still show at least 300 of the most popular comments with every branch. If I were you I wouldn't even attempt to solve this via the front end, and just do what every other app is doing.

Paging trees is no easy task and it'd take a sql expert who's dealt with this problem to properly solve it.

dessalines avatar Jul 14 '25 19:07 dessalines

@dessalines I left a similar comment in a PieFed issue. I know it's probably not this simple, but PieFed is also using Postgres and maybe whatever they come up with will be the key to solving it here.

christianjuth avatar Jul 14 '25 23:07 christianjuth

It seems Piefed solved this by adding a separate API endpoint: https://codeberg.org/rimu/pyfedi/issues/884#issuecomment-5994619

Nutomic avatar Nov 20 '25 09:11 Nutomic

@Nutomic they did! As part of this, PieFed chose to have that new endpoint return a tree of comments instead of flat list. Maybe that's why they made it a separate endpoint. I don't think it really matters if the endpoint returns a flat list with comment paths vs a tree. The real win - imo - is PieFed solved missing comment branches with the new endpoint.

christianjuth avatar Nov 22 '25 00:11 christianjuth

I greatly doubt this is solved, because the problem I mentioned in this comment still exists, and no one has answered it. You have to ignore any provided limit when doing a max_depth fetch (otherwise you will be missing branches) , yet you must still use a hard limit (we use 300), or you will be open to postgres attacks.

If a post has 10k comments, and you want a max_depth of 50, and don't use a hard limit, then postgres is going to return massive branches that potentially have all those comments, and easily DDOS your site.

I don't have a solution to that problem yet.

dessalines avatar Dec 01 '25 15:12 dessalines

To be honest I dont really understand how all this works.

@christianjuth Did you test to confirm that Piefed solves this problem on posts with hundreds of comments?

Nutomic avatar Dec 03 '25 14:12 Nutomic

Paging tree-type data is no easy task, and we really need an SQL expert with some familiarity with ltrees to figure out how to do it properly.

Everything DB wise is in place and stored correctly, but figuring out how to page through trees with ltree queries without returning too much data is a tough one.

I suspect the real answer is removing max_depth and using some smart ltree function that can page data at a certain level and smartly not return too many sub-branches with a limit.

As of right now at least, my solution is just:

query = query.filter(nlevel(comment::path).le(depth_limit));
query.limit = 300;

dessalines avatar Dec 03 '25 18:12 dessalines

I assume this doesnt require any breaking change, removing it from 1.0 milestone then.

Nutomic avatar Dec 05 '25 09:12 Nutomic

To be honest I dont really understand how all this works.

@christianjuth Did you test to confirm that Piefed solves this problem on posts with hundreds of comments?

I believe it does. I can do some more testing to confirm. I'm pretty out of my depth here, but I believe PieFed devs mentioned awhile back they don't use ltrees the way Lemmy does. Which might explain why they were able to implement this functionality.

I do think this would be a really nice improvement, but I also don't see why this has to be a breaking change. So I don't think waiting util after V1 is an issue.

christianjuth avatar Dec 05 '25 23:12 christianjuth