goci
goci copied to clipboard
Studies should not simultaneously be in published and unpublished states
Curators can "unpublish" studies in order to edit them and then "republish" them. The housekeeping table stores values such as the date these event occurred as well as a boolean for "is published". Somehow it has ended up that some studies can have isPublished as True but also have catalogUnpublishDate != null. If these are both True the pruning process interprets it as "this study is unpublished".
Need to investigate why this happens and prevent future occurrences.
One current example of a study in this state is GCST90000461.
More studies in this status: GCST90000295, GCST90000294. All studies are from author submissions, therefore it is critical that they are released to the public. They got blocked in between statuses when I unpublished them to add top associations -- and then tried to re-publish/ @jdhayhurst @ljwh2
@buniello - are these part of the submissions we've worked on together last week?
yes, @tudorgroza they are. But I don't believe the problem is linked with the import step. I think there is a problem at the backend of the study view in curation app - as these studies got stuck in between statuses when i tried to re-publish them through the study view functionality https://www.ebi.ac.uk/gwas/curation/publication/32814899
@sprintell - adding you to the mix :) I've tried to get to the bottom of this but couldn't. Maybe a fresh pair of eyes will help ... i.e., yours :)
The code run by Annalisa to do the status update for the studies above is triggered in publication.html
via:
<a role="button" type="submit" id="assignStatus"
class="btn btn-primary btn-sm">Save Status
</a>
which is actually executed in goci-publication-page.js
via $('#assignStatus').click(function ()
The endpoint called is /status_update
in PublicationController
(updateSelectedStatuses
).
After various steps - each study ends up being updated via updateStatus
in StudyOperationService
Here's the interesting part: GCST90000295
and GCST90000294
mentioned above correspond to study IDs: 62304264
and 62304257
. In the logs, we have the following sequence recorded:
2020-09-08 16:57:06.646 INFO 24045 --- [http-bio-8080-exec-937] u.a.e.s.g.c.s.StudyOperationsService : Study 62304271 status updated
2020-09-08 17:15:30.137 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304271 status updated
2020-09-08 17:15:30.209 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304264 status updated
2020-09-08 17:15:30.262 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304257 status updated
2020-09-08 17:15:30.310 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304250 status updated
2020-09-08 17:15:30.357 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304243 status updated
2020-09-08 17:15:30.403 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304236 status updated
2020-09-08 17:15:30.451 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304229 status updated
2020-09-08 17:15:30.502 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304222 status updated
2020-09-08 17:15:30.556 INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService : Study 62304215 status updated
If you look closely - those two studies are right in the middle. Everything above and below encountered no issues when having the status updated. Just these two.
For each study, the sequence of steps in updateStatus
is:
1. set published to TRUE
2. set published data if it doesn't exist
3. create note about republishing and add in the DB
4. set unpublish date to NULL
5. set unpublish reason to NULL
6. save housekeeping
7. log status change
These 2 failed studies seem to skip steps 4 and 5 above. Everything else is there ... the published
flag set to TRUE, the note ... and the log status change (which is practically the line you see in the listing above).
Unfortunately ... I just can't understand why those 2 fields are not set to NULL appropriately.
Hi @tudorgroza, I'm happy to take a look tomorrow
In the meantime I've set the values for unpublished reason and unpublished date to NULL in curation db for these studies: GCST90000295, GCST90000294, GCST90000461. This should have been what steps 4 and 5 above did.
Thank you @jdhayhurst
@tudorgroza , @jdhayhurst from my finding, this publication still has one study with id 62304271 and GCST90000296 also still in that confused unpublished state , it was published on 07/09 and has since been unpublished since 08/09 while paradoxically its isPublished status is set as true
That's the first one on the list ... very odd that Annalisa has not picked it up.
ok, @jdhayhurst please don't change the state please, lets leave it as it is for now, I need to use it as guinea-pig in my experiment to know what is wrong
@tudorgroza , @JalMacArthur , @buniello
FROM MY FINDINGS: This problem was created by too many flags been monitored unnecessarily e.g isPublished, UnpublishDate, unPublishReason, currentCurationStatus, newStatus creating Code smell as they are all checking thesame thing (whether the study is published or not) in different places, making it problematic to move studies from one state to another, while they are good to be stored in the database but not all of them should be used as condition to determine what happens to a study
In this situation:
1.) When curator select "Publish Study" from drop down and click on "Save Status" button for this publication (https://www.ebi.ac.uk/gwas/curation/publication/32814899) on the user interface, there is nothing happening in the database for any of the studies
In the back end when the "assignStudyStatus" method is called on this line of code https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/service/StudyOperationsService.java#L116
All the studies to be updated fails the condition (newStatus != currentStudyStatus)
Why ? Because their currentStudyStatus is "Publish Study" and the newStatus is also "Publish Study" as selected by the curator from the drop down.
2.) Secondly, when this happens, the else statements here: https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/service/StudyOperationsService.java#L137
sets the message report to be message = "Current status and new status are the same, no change required";
The message is returned to the calling controller which initiated the back end operation on this line :
https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/controller/PublicationController.java#L226
but funnily, the returned error string is never reported to the curator, but rather the "studyIds" that the curator sent initially was just aggregated into a map with a status "updated" as shown here: https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/controller/PublicationController.java#L227
and returned back to the front end as it is, hence the reason why a user interface pop up keeps saying all the studies are updated even though nothing happened .
3.) If Annalisa decides to change the curation status to something else other than "Publish Study" on a 1st pass, it will definitely escape that initial problem, but another conditional check waits in final step that does the database UPDATE ON THIS LINE: https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/service/StudyOperationsService.java#L281
This condition check shows that for any study that has its curationStatus as "Publish Study" If it's Housekeeping isPublished status is "true", its UnpublishDate and UnpublishReason can never be changed i.e Step 4 and step 5 listed by Tudor will never be implemented.
I think this last conditional check is unnecessary and when removed. the update should go through.
thank you, @sprintell
A couple of known and unknown ORACLE TRIGGERS scripts controlling the houseKeeping table, interferring and overriding developer's solution to this problem, @jhayhurst will manually change the necessary attributes for the study with id: 62304271 as was done for the others, since its the only one left in that unwanted state. to avoid spending too much developer's time on it
The two statements below yield exactly the same id's, yet the pruning step performs (2), which is more opaque. So it would seem that we can modify the pruning script to prune those that satisfy (1). This will be easier to read and allow us to circumvent the problem where some studies have an unpublished date even though they are published.
// 1. if is NOT published
select S.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published != 1;
// 2. if unpublished date NOT null or published date IS null
select s.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
and (hk.catalog_unpublish_date is not NULL
or hk.catalog_publish_date is NULL);
@sprintell and @ljwh2 - I'll let you decide whether you want the script to be modified - it should be easy, but we may want to consider why it was written in that way in the first place (unforeseen consequences)?
@sprintell To identify studies in the db that are in both published and unpublished states:
select * from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published = 1
AND hk.catalog_unpublish_date is not NULL;
To correctly assign these studies as 'published':
update housekeeping
set housekeeping.unpublish_reason_id = NULL, housekeeping.catalog_unpublish_date = NULL
where id in (
select HK.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published = 1
AND hk.catalog_unpublish_date is not NULL);
found another study falling into this group: GCST90012102 @jdhayhurst @sprintell this one cannot be published. blocked in the "unpublished" status.
For clarity, GCST90012102 is different in that it appears in the database as 'unpublished' - it's not in both states. However, it may be a victim of a related issue because it is not possible to publish this study from the curation app. When attempted, it appeared as though it was published from the 'publication view' but on the study view it was unpublished. In the housekeeping table in the database it always to be unpublished. Had to manually set this study to published:
update housekeeping
set housekeeping.unpublish_reason_id = NULL,
housekeeping.catalog_unpublish_date = NULL,
housekeeping.is_published = 1
where housekeeping.id = <housekeepingid>;
I don't know if this is still an issue