openstreetmap-website icon indicating copy to clipboard operation
openstreetmap-website copied to clipboard

Note: Add body & author

Open grekko opened this issue 1 year ago • 19 comments

Why are the changes necessary?

Addresses #3831 which describes how a set of bugs related to the current Note-model can be solved by merging the "special first comment" into the Note-record itself.

This PR is meant to serve as a discussion starter. I gave my best to make sure the code changes are not introducing any undesired behavioural changes, but I am sure there are things I've missed, so please regard this PR a work in progress.

What has changed?

  • Adjusts the Note-Model to hold body and the author
    • adds db fields notes.{body,author_id,author_ip}

Open TODOs

I am not sure if I successfully migrated the logic of Api::NotesController#feed. I understand that at least the behaviour of the .limit(result_limit)-call will involuntarily return more results than it currently does. This part of the PR needs more focus and probably I'll have to add some tests first.

Rollout

Step 1) Merge and deploy the db and code changes

  • New Note-records will be created with a body- and author-information
  • Web UI and API should still render the the same output

Step 2) Start backfilling process

Run the migrate_notes.rake-script either via long-running shell or I can also rewrite the code so it can be run as an ActiveJob in the background.

The script copies the author- and body-information from the opening NoteComment to the Note-record.

Step 3) Sanity check backfilled data

  • all Note-records should now have a non-empty body AFAIU (maybe there are exceptions? Notes and Note-Comments have been around since 2010. Back then it was called MapBugs though ^^)
  • TBD: Are there more things to sanity check?

Step 4) Clean up if everything looks good

Until this point all operations have been non-destructive. If everything looks good and no issues have been discovered, the NoteComments-records with an open-event can be removed.

Also all FIXME annotated code can be removed or updated.

grekko avatar Jan 16 '24 04:01 grekko

This PR is meant to serve as a discussion starter. I gave my best to make sure the code changes are not introducing any undesired behavioural changes, but I am sure there are things I've missed, so please regard this PR a work in progress.

It's missing the entire deleted users aspect that @gravitystorm for some reason avoided mentioning.

AntonKhorev avatar Jan 16 '24 12:01 AntonKhorev

Short summary:

  • note comments of deleted users were made hidden for spam-fighting purposes
  • this became the source of bugs
  • refactor plans never addressed what to do with comments of deleted users
  • @tomhughes opposes any changes to visibility of anything related to deleted users because he doesn't want spam to resurface
  • @grekko in this pull request chose to reveal the opening comment, both in the web ui and in the api

AntonKhorev avatar Jan 16 '24 13:01 AntonKhorev

I do not oppose any change to visibility of anything related to deleted users.

I want us to properly define how deleted users should be handled, and I specifically said that showing diary entries and comments for deleted uses would be a problem because historically we have hidden spam diary content by deleting the user without marking the content as hidden explicitly. I never said that applied to notes.

tomhughes avatar Jan 16 '24 13:01 tomhughes

I never said that applied to notes.

That's not how your Jan 10 03:37 comment on the dwg channel reads because you didn't say that it doesn't apply to notes. Since you also object to visibility changes of changesets, why would you be ok with the same thing for notes?

AntonKhorev avatar Jan 16 '24 14:01 AntonKhorev

@tomhughes you made this change to hide all of the deleted users' note comments https://github.com/openstreetmap/openstreetmap-website/commit/6027c42ee7b5ac6b36e8a62135d2cf67b3ca8250

AntonKhorev avatar Jan 16 '24 14:01 AntonKhorev

Sure, because our general operating principle AT THE TIME was to hide things created by deleted users.

As you are well aware that policy is currently being reviewed with the intention of seeking LWG approval for a new policy around what needs to be and/or should be hidden.

Nothing is set in stone, everything can be reviewed.

I wasn't even saying we can't change our policy around diary entries, just that we would have to consider how to avoid suddenly having lots of spam re-appear.

tomhughes avatar Jan 16 '24 14:01 tomhughes

@grekko in this pull request chose to reveal the opening comment, both in the web ui and in the api

That is true. The code changes in this PR effectively work around the special handling introduced in https://github.com/openstreetmap/openstreetmap-website/commit/6027c42ee7b5ac6b36e8a62135d2cf67b3ca8250, keeping the Note body of then-deleted users accessible. Thanks for pointing this out.

Just to be explicit: No User-related information would be accessible. The author would be displayed as anonymous. Also note comments of deleted Users would continue NOT to be accessible.

AFAIU the current policy of User#soft_destroy we would need to keep the information of Note-records by deleted Users hidden. Here a screenshot:

SCR-20240116-qskd

At this point I think it is useful to clarify if there are good reasons not to continue to comply with the aforementioned policy. If there are none, I can think of the following changes to move forward:

  • Notes of deleted users are not hidden per se, but the associated information (body and author) is redacted – effectively hiding the Notes comment. The author is already redacted.
  • Associated comments to the Note continue to be accessible (unless their authors are also deleted).

Or alternatively:

  • The Note and all of its associated comments are hidden once the original author of the Note is deleted.

Although the latter approach seems rather invasive to other users. I can imagine that Note comments carry a lot of value even though if the message of the initial Note is redacted. Also it seems wrong that a User can effectively manipulate the visibility of other Users comments.

@AntonKhorev Do you think I'm missing here something else?

grekko avatar Jan 17 '24 00:01 grekko

@grekko I think we should split this in two scenarios:

  • Scenario 1 Spammer: User creates 10 notes, all without comments. User gets deleted, so user data and content is hidden. BUT in this case, the nodes should IMO disappear as well. They are not actionable and just noise.
  • Scenario 2: User creates a note, other users comment. Creator deletes account and data is hidden. Now the discussion is still valid and should be visible until resolved (and later auto-hidden from the map)

Just FYI

Just to be explicit: No User-related information would NOT be accessible. The author would be displayed as anonymous. Also note comments of deleted Users would continue NOT to be accessible.

I don't understand this sentence :-). But I think I got the overall message of the thread / that is clear.

tordans avatar Jan 17 '24 08:01 tordans

I don't understand this sentence :-). But I think I got the overall message of the thread / that is clear.

The NOT was superfluous. I edited the comment above.

grekko avatar Jan 17 '24 11:01 grekko

I think we should split this in two scenarios

Agreed. I think for Scenario 1 there are tools for Moderators in place to hide such Notes. And to cover Scenario 2 too, it would be useful to keep Notes of deleted Users still accessible and just redact the Note body.

grekko avatar Jan 17 '24 12:01 grekko

a) Should the data migration (to backfill the notes.{body,author_id,author_ip} run in the scope of the migration or can this be a rake task which is running in the background and allows to perform some sanity checks on the data?

I'm slightly torn on this one. My instinct is that the steps should be decoupled - adding columns in a transaction, and then a separate rake task that can be (re)-run as required to do the migration. In particular, if there's some record somewhere that causes an error (or the process times out or whatever) then I'd rather not be rolling everything back.

However, I'm not sure how this will work for other deployments, who might not be so hot on following the issue tracker and keeping up to date with everything. At some point they will update their installation and run the database migrations, but will they know that there's a rake task to run too?

At some point in the future, after the migration is completed on osm.org we'll want to do a code cleanup, and probably make body text NOT NULL. Perhaps that will be the point that they realise there's something that needs to be done, when that cleanup migration fails? On the plus side, anyone without any notes can run both the migrations fine back-to-back since the cleanup migration would work without error on an empty db.

gravitystorm avatar Jan 31 '24 15:01 gravitystorm

I'm slightly torn on this one. My instinct is that the steps should be decoupled - adding columns in a transaction, and then a separate rake task that can be (re)-run as required to do the migration. In particular, if there's some record somewhere that causes an error (or the process times out or whatever) then I'd rather not be rolling everything back.

It is generally better to do things in a migration so they just work but here we do have four million records which will need to be migrated so we do need to think about the risk of a failure and how long a migration might take and what lock contention issues it might involve...

tomhughes avatar Jan 31 '24 20:01 tomhughes

I did a quick test and running the big UPDATE in a single transaction in production is definitely going to cause lock contention issues - not sure how long it would actually take because enough locks were piling up that I had to abort it.

tomhughes avatar Jan 31 '24 20:01 tomhughes

A summary of the recent changes:

  • 1c3d3b6 redacts Note#body if Note#author is soft-deleted
  • edfab03 extracted Note-Migration code into Note::MigrateOpenedComment and added basic tests for that class
  • 5f4a414 hides the authors display name in api/notes/_note.rss.builder if the author is deleted (the same way the author name is hidden in the Web UI)

@gravitystorm could you give this PR a rough review so we can spot weak spots or issues I can address?

@tomhughes thanks for trying out the UPDATE-SQL Statement. I'd be interested if there are any NoteComment-records with event: "opened" and with a body either being NULL or EMPTY. The migration code assumes that there will be a non-zero-length body but there are neither DB constraints nor Model Validations enforcing this.

grekko avatar Feb 07 '24 15:02 grekko

Please can you take another look?

cc @gravitystorm @tomhughes

fititnt avatar Apr 16 '24 22:04 fititnt

@gravitystorm this EWG project is waiting for input for a few month now. I hope it's OK to ping here as a reminder to move it up in the review queue.

tordans avatar Jun 19 '24 15:06 tordans