openlibrary
openlibrary copied to clipboard
Fix: show the first import source in the history comment
Closes #2643
Feature. This PR makes the import source listed in a history comment of an edition show the original import source, rather than the latest, which is what @cdrini effectively suggested here: https://github.com/internetarchive/openlibrary/issues/6778#issuecomment-1194630309
Technical
This does nothing to address the fact that subsequent import sources don't show up anywhere. I looked at the Changset
class in the hope there would be an easy way to see the changes in each revision without having to query the database to do a diff between each revision to see if source_records
change, but if it is there I simply overlooked it.
Testing
With the code on the current master
branch:
Create an item with a source_records
field having some ia:identifier
, such as "source_records": ["ia:b33093337"]
.
Verify the history looks something like this:
Re-import the item with a new source record, say, idb:1111111111
, and notice the revision 1 comment changes to say that the item was imported from idb
:
Now apply the PR and see that the original import source is listed under revision 1:
Screenshot
Stakeholders
@hornc @seabelis @cdrini
I'm a little conflicted about this one. It seems like we're changing one error for another. What is v
exactly here? Maybe it includes the before/after of the change? If so, we can just compare the source_record fields and see if a source_record was added perhaps?
If we can give v
an actual type that would be nice too; I think it might be some sort of changeset
?
Actually I think that's maybe what we want. Update openlibrary/templates/history.html
so that when it renders the comment, it supplies not just v
, but also prev
. Then we can do a diff later to render things correctly! That might even be a way for us to add "auto comments" if the user didn't supply a comment (eg Modified title) but that's out of scope of this task :P
I should have made it more clear in the PR, but I had tried to get the diff out of the Changeset
but wasn't able to figure it out, sadly. I think it is time to put this poor little PR to bed.
Hang on, isn't this better than the current behavior?
The code modified is only for the first revision case v.revision == 1
, and its making sure the first revision history displays the first source record, rather than the last. What is the new error @cdrini ?
I'm confused because the title of this PR seems to be the opposite of what the code is doing, but the comment on https://github.com/internetarchive/openlibrary/pull/8667/commits/76817b26973394f9978dccfe83f3af4f779cbc42 looks correct:
This commit makes the import source listed in a history comment of an edition show the original import source, rather than the latest.
Which is what we want (if it's the v1 history comment). So I think this PR is an improvement and fixes the problem as described in https://github.com/internetarchive/openlibrary/issues/6778#issuecomment-1194630309
There are numerous things broken with the history, and what is displayed differs for logged in users and non logged in users.
e.g. https://openlibrary.org/books/OL26354767M/How_to_Be_a_Stoic?v=1
logged in, the v=1 page shows:
Note the created date says 2019, which is wrong.
not logged in the same page shows:
which is correct,
Both v1 pages show:
Which is incorrect -- it's using the current version last edit user with the specified version last edit date.
MARC Bot did not edit this page in 2017.
It looks like if you are logged in, this function messes with the first entry of the edition history by making a fake edit comment. Non-logged in users see the original comment on the first entry.
Both logged in and non-logged in users see the constructed (incorrect) comment on the history page. e.g: https://openlibrary.org/books/OL26354767M/How_to_Be_a_Stoic?m=history
It looks like the fix might be remove the code that messes with the original comment, because it's not accurate.
We lose the ability to click on the last imported MARC record.
The UI currently states it's the first imported MARC record which occurred when the record was created, which could be wrong on both counts:
- The record may have been created by a user (like the OL26354767M examples)
- And it shows the latest imported MARC record, not the earliest.
Sorry I think I was a little vague.
First: The players
-
openlibrary/plugins/upstream/utils.py:process_version(v: ChangeSet)
(See sample changesets here . I think this is aChangeSet
class, but it might be a web.storage over a ChangeSet-like object) -
openlibrary/templates/history/comment.html
renders a single row in the history table, calling the above -
openlibrary/templates/history.html
renders all the rows calling the above
Second: The plan:
On line 527 of process_version
:
https://github.com/internetarchive/openlibrary/blob/4a7a4342650e994a9292573df7d1f1cab8a25e77/openlibrary/plugins/upstream/utils.py#L518-L538
We make a db call and fetch the full record. This means we have the full version of the thing at the specific revision. We do this for every line in the history table.
If we instead do this db call in history.html
, we can pass forward to comment.html
and to process_version
some extra info: The previous revision and the current revision. It can then use this to determine if/what source_records
were added during the edit.
This will let us display the actually correct value. And it shouldn't be too much work to make it happen!
I had a giant wall of text here before I realized I was thinking about this all wrong. I'll have something better at some point. :)