openstreetmap-website icon indicating copy to clipboard operation
openstreetmap-website copied to clipboard

Adds version to notes table and creates note_versions table

Open nenad-vujicic opened this issue 9 months ago • 21 comments

Description

PR adds version column to notes table for keeping version number of the latest note's version and creates note_versions table for keeping all notes versions. This is based on this comment, but without creating indices, associations, .. (will be created when we need them). Also, PR proposes using note_versions instead of using old_notes based on this comment.

How has this been tested?

Automated unit tests and manual testing

nenad-vujicic avatar Mar 17 '25 10:03 nenad-vujicic

The only thing that can actually change over time is the (proposed) tags right? So is the OSM data model of copying everything the right approach?

tomhughes avatar Mar 17 '25 10:03 tomhughes

The only thing that can actually change over time is the (proposed) tags right?

At the moment, yes. But, why not allowing updating description, position, ..., creating / updating by sending XML / JSON with entire note's data (all proposed here)?

nenad-vujicic avatar Mar 17 '25 10:03 nenad-vujicic

Merged migration scripts into one (by moving creating foreign key into create_table), added redaction record to note_versions and associations to model files for easier incoming PRs, also removed unnecessary default values from note_versions and foreign keys.

nenad-vujicic avatar Mar 18 '25 17:03 nenad-vujicic

It would be great if we could get some comments on this. Do we want note versions architecture like for other elements (nodes, ways, ..) or something simpler? After this PR we will add creating new note versions when notes are closed / reopened, generate note versions for current notes from note_comments, start adding versioned note tags, ..

@AntonKhorev, since you proposed this, do you have some comment / suggestion?

Thanks!

nenad-vujicic avatar Mar 19 '25 15:03 nenad-vujicic

I was thinking if we need both redactions and hidden statuses. But looks like we do because redactions can't remove top versions and hides can't remove just one version.

AntonKhorev avatar Mar 23 '25 01:03 AntonKhorev

Se also the last paragraph of https://github.com/openstreetmap/openstreetmap-website/issues/5294#issuecomment-2541604426, where I write about keeping id from api0.6-comments. Some of those api0.6-comments will become non-comments and will be deleted from the comments table. In this case their ids are lost. If we are to use them for sorting, we don't need exactly the same ids, anything that forms the same sequence will do.

Or maybe we don't need any of this and we need instead a global counter for note updates with its values stored in the maid notes table.

AntonKhorev avatar Mar 23 '25 15:03 AntonKhorev

@AntonKhorev, just to clarify—are you referring to sorting notes by their latest updates or reconstructing the full note history for discussions (e.g., from note, note_versions, and note_comments)?

If we set an updated_at attribute whenever a new version is created, why would sorting by that be problematic? Wouldn't it be sufficient for ordering by the latest updates instead of relying on a surrogate id from note_comments? Or is this something problematic performance wise?

I'm not sure I understood the real problem here.. Thanks in advance for clarifications.

kcne avatar Mar 25 '25 17:03 kcne

If we set an updated_at attribute whenever a new version is created, why would sorting by that be problematic? Wouldn't it be sufficient for ordering by the latest updates instead of relying on a surrogate id from note_comments?

That's what I tried to do in #4532. In our pagination code we have before and after parameters that are ids which works fine when we order by ids. Ordering by ids is like ordering by creation date. If we want to order by update date but still have ids in before and after, see that pull request for what I had to do.

Another option is to use timestamps in before/after instead of ids, but then we'd have to decide on precision. Our standard precision of 1 second that we use in the api responses is not enough, you can get two notes updated in the same second easily. And there's no guarantee that they won't get the same update time.

AntonKhorev avatar Mar 25 '25 22:03 AntonKhorev

I'm not sure if this should be part of this PR—maybe we should address the issue in another PR.

Regarding sorting by timestamp, we could always sort by timestamp/updated_at first and then by id to ensure a stable order of notes. If multiple notes have the same timestamp, the order of those specific notes probably wouldn't significantly impact the user experience, as the difference would be trivial.

kcne avatar Mar 31 '25 10:03 kcne

I'm not sure if this should be part of this PR—maybe we should address the issue in another PR.

Well, I think it's same if we do this now or in next PR (but really next), because we cannot delay this because next steps will be:

  • Create versions on reopen / close (where we'll have to set note_version.note_comment_id to note_comment.id which is created on new action)
  • Generate note versions from note_comments (where we'll copy note_comment.id to note_versions.note_comment_id for every generated note version)
  • Now we'll be able to do in parallel adding note tags and removing special note comments.

Se also the last paragraph of #5294 (comment), where I write about keeping id from api0.6-comments. Some of those api0.6-comments will become non-comments and will be deleted from the comments table. In this case their ids are lost. If we are to use them for sorting, we don't need exactly the same ids, anything that forms the same sequence will do.

@AntonKhorev thanks for this idea! I've just added new column note_comment_id and index on it. I'm only not sure about the new column name (perhaps more natural name would be ordering_id or similar? after removing note comments) and if index should be composite (I still don't have in my head all situations in which new column will be used and how).

nenad-vujicic avatar Mar 31 '25 12:03 nenad-vujicic

@tomhughes @AntonKhorev Do you think this is OK and we can continue with it, or we need some additional changes (or simpler architecture allowing only updating note-tags)? Our current solution (which uses same approach like with nodes, ways, relations) will add 2 more tables (note_tags and note_tag_versions) for keeping versioned note-tags and it will allow (beside status which we can do already) updating note's tags, description, position, .. while keeping backward compatibility.

Or would you prefer first switching from x-www-form-urlencoded to xml like for nodes / ways / relations / ..? or something else?

Thanks!

nenad-vujicic avatar Apr 04 '25 13:04 nenad-vujicic

@gravitystorm @tomhughes @AntonKhorev I would really appreciate if you could make some comments about this PR since it could greatly help me in wrapping up my efforts from #5904 (notes versioning, editable note tags, removing special comments, ...).

Thanks.

nenad-vujicic avatar Apr 15 '25 17:04 nenad-vujicic

I mean we still haven't finished implementing the previously agreed schema changes for notes have we? and now we want to start altering it again..

I understand the logic of matching how the geodata is versions, but notes are not the same and this seems to create quite a confusing schema to me. I can see each note version has a comment reference, so is a new version created every time there is a comment? So there is a one-one relationship between comments and versions? or maybe many-one because you can can have a version without a new comment?

Maybe this is the right solution but right now it just seems that it sets up a very confusing triangular relationship between notes, note_versions and note_comments.

tomhughes avatar Apr 15 '25 18:04 tomhughes

I mean we still haven't finished implementing the previously agreed schema changes for notes have we? and now we want to start altering it again..

Do you mean removing event column from NoteComment and removing special comments?

I understand the logic of matching how the geodata is versions, but notes are not the same and this seems to create quite a confusing schema to me. I can see each note version has a comment reference, so is a new version created every time there is a comment? So there is a one-one relationship between comments and versions? or maybe many-one because you can can have a version without a new comment?

I'm sorry, perhaps I made a mistake in #5904 My idea is to have comments shared for all versions. New version is created only when user closes, reopens or hides a note.

nenad-vujicic avatar Apr 15 '25 18:04 nenad-vujicic

So what's the note_comment_id in the versions table for?

tomhughes avatar Apr 15 '25 19:04 tomhughes

So what's the note_comment_id in the versions table for?

note_comment_id is the ID of the note_comment from which the note_version is generated. I am using it for ordering note comments (reconstructed from note_comments and note_versions) after deleting note comments with non-present body and is suggested by @AntonKhorev here.

nenad-vujicic avatar Apr 15 '25 20:04 nenad-vujicic

Why do we need versions to sort comments? Comments already have IDs that can be used for ordering them and deleting some of them doesn't change that - a sequence with gaps is just as ordered as one without them...

tomhughes avatar Apr 15 '25 21:04 tomhughes

Why do we need versions to sort comments? Comments already have IDs that can be used for ordering them and deleting some of them doesn't change that - a sequence with gaps is just as ordered as one without them...

Yes, but we want to keep backward compatibility and display (return in JSON / XML) special comments (reconstructed from note_version about closing, reopening, hiding, ..) too even if their body is non-present (like on the screenshot) in appropriate order. If we can abandon these special comments, we wouldn't need note_comment_id.

image

nenad-vujicic avatar Apr 15 '25 21:04 nenad-vujicic

After implementing removing special comments (ones with non-present body) in #5904 (which contains all functionaliti , I realized we need to keep note comment's visibility (so we could hide note comment after deleting special comments) and event which led to creating new version (so we could regenerate message in Discussion thread) too, so I added these two records to the note_versions table.

@tomhughes If you would prefer, I could implement larger PRs with full functionality. That is why I created #5904 also. Also, every comment is more than welcome as always :smile:

nenad-vujicic avatar Apr 18 '25 17:04 nenad-vujicic

After implementing removing special comments (ones with non-present body) in #5904 (which contains all functionaliti , I realized we need to keep note comment's visibility (so we could hide note comment after deleting special comments)

After short thought, perhaps we don't need to control visibility of special note comments with non-present body(?) so, I removed note_comment_visible record from note_versions to keep table as small as possible.

nenad-vujicic avatar Apr 22 '25 21:04 nenad-vujicic

@tomhughes @AntonKhorev @gravitystorm

Do we have some conclusions about notes versioning / tags? At the beginning I proposed non-mutable note tags in #5344, then you requested notes versioning (here and here) what is implemented in #5904 (also removing event column and non-present note-comments) and this PR is the first part of all of this.

If you don't like this (because of complexity or something else), we can do #5344 with update functionality (which will allow changing tags (or entire note?)), but I'm not sure allowing permanently deletions is good thing. Also, I'm not sure how we would do removing non-present note-comments without breaking backward compatibility.

I would like to make some contribution in area of notes, but I cannot do it without your convergence on these topics, so I would highly appreciate if we could make some move forward about this.

Thanks, Nenad.

nenad-vujicic avatar May 01 '25 20:05 nenad-vujicic