openstreetmap-website icon indicating copy to clipboard operation
openstreetmap-website copied to clipboard

Add changeset comment search api with filtering by author and time

Open AntonKhorev opened this issue 1 year ago • 12 comments

Similar to https://github.com/openstreetmap/openstreetmap-website/pull/4248, but for the api. This is the first part that doesn't require any db changes. The next step is to add searching by user who created the commented changeset, then it will be possible to have a user's discussed changeset list like on https://hdyc.neis-one.org/.

  • There was a choice of route to add this request to, either index (used by changeset search), or search (used by notes search). I chose index because you like resourceful routes and there's no other obvious request to put at index.
  • Common user, time interval from/to, limit methods moved to a mixin.
  • Since limit parameter already existed for changeset comment feeds, added default/max limit settings like the ones for notes and changesets.

AntonKhorev avatar Nov 21 '23 16:11 AntonKhorev

I think it would be useful to be able select changeset comments relevant to a user - which would include comments to all changesets they are subscribed to. Something along the lines of:

select * from changeset_comments
where changeset_id in (
  select changeset_id from changesets_subscribers
  where subscriber_id=%{user_id});

milan-cvetkovic avatar Jan 31 '24 13:01 milan-cvetkovic

That's a potentially very expensive query though so I'm not sure we want to be doing that in the live API.

tomhughes avatar Jan 31 '24 13:01 tomhughes

I should add that I haven't reviewed this PR at all yet so I have no idea what other queries it is doing or whether they are sustainable but in general "search" apis make me extremely nervous as they've often caused us major performance problems in the past.

tomhughes avatar Jan 31 '24 13:01 tomhughes

That's a potentially very expensive query though so I'm not sure we want to be doing that in the live API.

I understand the reluctance, one has to be careful with search queries.

I was thinking of the usefulness of such a query - I assume users would like to know what comments others wrote to their changesets (and replies to their comments).

Not underestimating the complexity, I think the query can be indexed and use single join.

milan-cvetkovic avatar Feb 01 '24 02:02 milan-cvetkovic

Oh there is an index so it won't have to do a scan - my concern is more that the set of changesets somebody is subscribed to might be very large.

tomhughes avatar Feb 01 '24 07:02 tomhughes

It shouldn't be too bad. It is the number of comments on their subscribed changesets that is the limiting factor, rather than number of subscribed changesets. Inner join should get rid of the changesets without comments, and we can limit the size of the output. I am not sure what the heavy duty contributors' numbers would be though - I am the only one commenting on my changesets so far :-(

Here is the query rewritten to use JOIN explicitly:

SELECT * FROM changesets_subscribers S
INNER JOIN changeset_comments M
ON S.changeset_id=M.changeset_id
WHERE S.subscriber_id='<userid>' AND M.visible = TRUE
ORDER by M.created_at DESC
LIMIT 100

milan-cvetkovic avatar Feb 06 '24 17:02 milan-cvetkovic

Yes a join is probably better than a subquery here - the optimiser might do that anyway but join is more what we want so is likely a better choice.

The current record appears to be a user that is subscribed to nearly 650000 changesets ;-)

tomhughes avatar Feb 06 '24 18:02 tomhughes

The current record appears to be a user that is subscribed to nearly 650000 changesets ;-)

Wow! Probably the number of changeset comments for him/her is also in thounsands

milan-cvetkovic avatar Feb 07 '24 09:02 milan-cvetkovic

Actually not really - they have just made a LOT of changesets (like nearly twice as many as the next biggest user) but they have never had a single comment!

The worst case for number of changesets commented on is a user with about 20000 changesets that attracted comments, basically because DWG reverted their work and commented on all the changesets.

The worst case for number of changeset comments is a user with just over 40000 comments because somebody took a dislike to them and robo-posted 300+ comments on each of their changesets.

tomhughes avatar Feb 07 '24 10:02 tomhughes

I think it would be useful to be able select changeset comments relevant to a user - which would include comments to all changesets they are subscribed to.

To any user? I don't think your subscriptions are revealed to anyone other than yourself currently.

AntonKhorev avatar Feb 09 '24 13:02 AntonKhorev

I think it would be useful to be able select changeset comments relevant to a user - which would include comments to all changesets they are subscribed to.

To any user? I don't think your subscriptions are revealed to anyone other than yourself currently.

I don't think that would be wise. Only to the logged-in user.

In other words, it would be nice to have an ability to ask for "List of all changeset comments I to changesets I am subscribed to, newer than 7 days, maximum of 100 in the list"

milan-cvetkovic avatar Feb 09 '24 16:02 milan-cvetkovic

I don't think that would be wise. Only to the logged-in user.

This is usually done with a different endpoint:

  • https://wiki.openstreetmap.org/wiki/API_v0.6#Details_of_the_logged-in_user:GET/api/0.6/user/details
  • https://wiki.openstreetmap.org/wiki/API_v0.6#List:GET/api/0.6/user/gpx_files

AntonKhorev avatar Feb 10 '24 13:02 AntonKhorev