openstreetmap-website
openstreetmap-website copied to clipboard
Note: Add body & author
Why are the changes necessary?
Addresses #3831 which describes how a set of bugs related to the current Note
-model can be solved by merging the "special first comment" into the Note
-record itself.
This PR is meant to serve as a discussion starter. I gave my best to make sure the code changes are not introducing any undesired behavioural changes, but I am sure there are things I've missed, so please regard this PR a work in progress.
What has changed?
- Adjusts the
Note
-Model to holdbody
and theauthor
- adds db fields
notes.{body,author_id,author_ip}
- adds db fields
Open TODOs
I am not sure if I successfully migrated the logic of Api::NotesController#feed
. I understand that at least the behaviour of the .limit(result_limit)
-call will involuntarily return more results than it currently does. This part of the PR needs more focus and probably I'll have to add some tests first.
Rollout
Step 1) Merge and deploy the db and code changes
- New Note-records will be created with a
body
- andauthor
-information - Web UI and API should still render the the same output
Step 2) Start backfilling process
Run the migrate_notes.rake
-script either via long-running shell or I can also rewrite the code so it can be run as an ActiveJob in the background.
The script copies the author
- and body
-information from the opening NoteComment
to the Note
-record.
Step 3) Sanity check backfilled data
- all
Note
-records should now have a non-emptybody
AFAIU (maybe there are exceptions? Notes and Note-Comments have been around since 2010. Back then it was called MapBugs though ^^) - TBD: Are there more things to sanity check?
Step 4) Clean up if everything looks good
Until this point all operations have been non-destructive. If everything looks good and no issues have been discovered, the NoteComments
-records with an open
-event can be removed.
Also all FIXME annotated code can be removed or updated.
This PR is meant to serve as a discussion starter. I gave my best to make sure the code changes are not introducing any undesired behavioural changes, but I am sure there are things I've missed, so please regard this PR a work in progress.
It's missing the entire deleted users aspect that @gravitystorm for some reason avoided mentioning.
Short summary:
- note comments of deleted users were made hidden for spam-fighting purposes
- this became the source of bugs
- refactor plans never addressed what to do with comments of deleted users
- @tomhughes opposes any changes to visibility of anything related to deleted users because he doesn't want spam to resurface
- @grekko in this pull request chose to reveal the opening comment, both in the web ui and in the api
I do not oppose any change to visibility of anything related to deleted users.
I want us to properly define how deleted users should be handled, and I specifically said that showing diary entries and comments for deleted uses would be a problem because historically we have hidden spam diary content by deleting the user without marking the content as hidden explicitly. I never said that applied to notes.
I never said that applied to notes.
That's not how your Jan 10 03:37 comment on the dwg channel reads because you didn't say that it doesn't apply to notes. Since you also object to visibility changes of changesets, why would you be ok with the same thing for notes?
@tomhughes you made this change to hide all of the deleted users' note comments https://github.com/openstreetmap/openstreetmap-website/commit/6027c42ee7b5ac6b36e8a62135d2cf67b3ca8250
Sure, because our general operating principle AT THE TIME was to hide things created by deleted users.
As you are well aware that policy is currently being reviewed with the intention of seeking LWG approval for a new policy around what needs to be and/or should be hidden.
Nothing is set in stone, everything can be reviewed.
I wasn't even saying we can't change our policy around diary entries, just that we would have to consider how to avoid suddenly having lots of spam re-appear.
@grekko in this pull request chose to reveal the opening comment, both in the web ui and in the api
That is true. The code changes in this PR effectively work around the special handling introduced in https://github.com/openstreetmap/openstreetmap-website/commit/6027c42ee7b5ac6b36e8a62135d2cf67b3ca8250, keeping the Note body of then-deleted users accessible. Thanks for pointing this out.
Just to be explicit: No User-related information would be accessible. The author would be displayed as anonymous
. Also note comments of deleted Users would continue NOT to be accessible.
AFAIU the current policy of User#soft_destroy
we would need to keep the information of Note-records by deleted Users hidden. Here a screenshot:
At this point I think it is useful to clarify if there are good reasons not to continue to comply with the aforementioned policy. If there are none, I can think of the following changes to move forward:
- Notes of deleted users are not hidden per se, but the associated information (body and author) is redacted – effectively hiding the Notes comment. The author is already redacted.
- Associated comments to the Note continue to be accessible (unless their authors are also deleted).
Or alternatively:
- The Note and all of its associated comments are hidden once the original author of the Note is deleted.
Although the latter approach seems rather invasive to other users. I can imagine that Note comments carry a lot of value even though if the message of the initial Note is redacted. Also it seems wrong that a User can effectively manipulate the visibility of other Users comments.
@AntonKhorev Do you think I'm missing here something else?
@grekko I think we should split this in two scenarios:
- Scenario 1 Spammer: User creates 10 notes, all without comments. User gets deleted, so user data and content is hidden. BUT in this case, the nodes should IMO disappear as well. They are not actionable and just noise.
- Scenario 2: User creates a note, other users comment. Creator deletes account and data is hidden. Now the discussion is still valid and should be visible until resolved (and later auto-hidden from the map)
Just FYI
Just to be explicit: No User-related information would NOT be accessible. The author would be displayed as anonymous. Also note comments of deleted Users would continue NOT to be accessible.
I don't understand this sentence :-). But I think I got the overall message of the thread / that is clear.
I don't understand this sentence :-). But I think I got the overall message of the thread / that is clear.
The NOT was superfluous. I edited the comment above.
I think we should split this in two scenarios
Agreed. I think for Scenario 1 there are tools for Moderators in place to hide such Notes. And to cover Scenario 2 too, it would be useful to keep Notes of deleted Users still accessible and just redact the Note body.
a) Should the data migration (to backfill the notes.{body,author_id,author_ip} run in the scope of the migration or can this be a rake task which is running in the background and allows to perform some sanity checks on the data?
I'm slightly torn on this one. My instinct is that the steps should be decoupled - adding columns in a transaction, and then a separate rake task that can be (re)-run as required to do the migration. In particular, if there's some record somewhere that causes an error (or the process times out or whatever) then I'd rather not be rolling everything back.
However, I'm not sure how this will work for other deployments, who might not be so hot on following the issue tracker and keeping up to date with everything. At some point they will update their installation and run the database migrations, but will they know that there's a rake task to run too?
At some point in the future, after the migration is completed on osm.org we'll want to do a code cleanup, and probably make body text NOT NULL
. Perhaps that will be the point that they realise there's something that needs to be done, when that cleanup migration fails? On the plus side, anyone without any notes can run both the migrations fine back-to-back since the cleanup migration would work without error on an empty db.
I'm slightly torn on this one. My instinct is that the steps should be decoupled - adding columns in a transaction, and then a separate rake task that can be (re)-run as required to do the migration. In particular, if there's some record somewhere that causes an error (or the process times out or whatever) then I'd rather not be rolling everything back.
It is generally better to do things in a migration so they just work but here we do have four million records which will need to be migrated so we do need to think about the risk of a failure and how long a migration might take and what lock contention issues it might involve...
I did a quick test and running the big UPDATE
in a single transaction in production is definitely going to cause lock contention issues - not sure how long it would actually take because enough locks were piling up that I had to abort it.
A summary of the recent changes:
- 1c3d3b6 redacts
Note#body
ifNote#author
is soft-deleted - edfab03 extracted Note-Migration code into
Note::MigrateOpenedComment
and added basic tests for that class - 5f4a414 hides the authors display name in
api/notes/_note.rss.builder
if the author is deleted (the same way the author name is hidden in the Web UI)
@gravitystorm could you give this PR a rough review so we can spot weak spots or issues I can address?
@tomhughes thanks for trying out the UPDATE
-SQL Statement.
I'd be interested if there are any NoteComment
-records with event: "opened"
and with a body
either being NULL
or EMPTY
.
The migration code assumes that there will be a non-zero-length body
but there are neither DB constraints nor Model Validations enforcing this.
Please can you take another look?
cc @gravitystorm @tomhughes
@gravitystorm this EWG project is waiting for input for a few month now. I hope it's OK to ping here as a reminder to move it up in the review queue.