goci icon indicating copy to clipboard operation
goci copied to clipboard

Studies should not simultaneously be in published and unpublished states

Open jdhayhurst opened this issue 4 years ago • 18 comments

Curators can "unpublish" studies in order to edit them and then "republish" them. The housekeeping table stores values such as the date these event occurred as well as a boolean for "is published". Somehow it has ended up that some studies can have isPublished as True but also have catalogUnpublishDate != null. If these are both True the pruning process interprets it as "this study is unpublished".

Need to investigate why this happens and prevent future occurrences.

One current example of a study in this state is GCST90000461.

jdhayhurst avatar Aug 27 '20 14:08 jdhayhurst

More studies in this status: GCST90000295, GCST90000294. All studies are from author submissions, therefore it is critical that they are released to the public. They got blocked in between statuses when I unpublished them to add top associations -- and then tried to re-publish/ @jdhayhurst @ljwh2

buniello avatar Sep 09 '20 10:09 buniello

@buniello - are these part of the submissions we've worked on together last week?

tudorgroza avatar Sep 10 '20 12:09 tudorgroza

yes, @tudorgroza they are. But I don't believe the problem is linked with the import step. I think there is a problem at the backend of the study view in curation app - as these studies got stuck in between statuses when i tried to re-publish them through the study view functionality https://www.ebi.ac.uk/gwas/curation/publication/32814899

buniello avatar Sep 10 '20 13:09 buniello

@sprintell - adding you to the mix :) I've tried to get to the bottom of this but couldn't. Maybe a fresh pair of eyes will help ... i.e., yours :)

The code run by Annalisa to do the status update for the studies above is triggered in publication.html via:

<a role="button" type="submit" id="assignStatus"
                        class="btn btn-primary btn-sm">Save Status
                </a>

which is actually executed in goci-publication-page.js via $('#assignStatus').click(function ()

The endpoint called is /status_update in PublicationController (updateSelectedStatuses). After various steps - each study ends up being updated via updateStatus in StudyOperationService

Here's the interesting part: GCST90000295 and GCST90000294 mentioned above correspond to study IDs: 62304264 and 62304257. In the logs, we have the following sequence recorded:

2020-09-08 16:57:06.646  INFO 24045 --- [http-bio-8080-exec-937] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304271 status updated
2020-09-08 17:15:30.137  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304271 status updated
2020-09-08 17:15:30.209  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304264 status updated
2020-09-08 17:15:30.262  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304257 status updated
2020-09-08 17:15:30.310  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304250 status updated
2020-09-08 17:15:30.357  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304243 status updated
2020-09-08 17:15:30.403  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304236 status updated
2020-09-08 17:15:30.451  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304229 status updated
2020-09-08 17:15:30.502  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304222 status updated
2020-09-08 17:15:30.556  INFO 24045 --- [http-bio-8080-exec-934] u.a.e.s.g.c.s.StudyOperationsService     : Study 62304215 status updated

If you look closely - those two studies are right in the middle. Everything above and below encountered no issues when having the status updated. Just these two.

For each study, the sequence of steps in updateStatus is:

1. set published to TRUE
2. set published data if it doesn't exist
3. create note about republishing and add in the DB
4. set unpublish date to NULL
5. set unpublish reason to NULL
6. save housekeeping
7. log status change

These 2 failed studies seem to skip steps 4 and 5 above. Everything else is there ... the published flag set to TRUE, the note ... and the log status change (which is practically the line you see in the listing above).

Unfortunately ... I just can't understand why those 2 fields are not set to NULL appropriately.

tudorgroza avatar Sep 10 '20 15:09 tudorgroza

Hi @tudorgroza, I'm happy to take a look tomorrow

sprintell avatar Sep 10 '20 16:09 sprintell

In the meantime I've set the values for unpublished reason and unpublished date to NULL in curation db for these studies: GCST90000295, GCST90000294, GCST90000461. This should have been what steps 4 and 5 above did.

jdhayhurst avatar Sep 11 '20 11:09 jdhayhurst

Thank you @jdhayhurst

tudorgroza avatar Sep 11 '20 12:09 tudorgroza

@tudorgroza , @jdhayhurst from my finding, this publication still has one study with id 62304271 and GCST90000296 also still in that confused unpublished state , it was published on 07/09 and has since been unpublished since 08/09 while paradoxically its isPublished status is set as true

Screenshot 2020-09-11 at 14 13 46

sprintell avatar Sep 11 '20 13:09 sprintell

That's the first one on the list ... very odd that Annalisa has not picked it up.

tudorgroza avatar Sep 11 '20 13:09 tudorgroza

ok, @jdhayhurst please don't change the state please, lets leave it as it is for now, I need to use it as guinea-pig in my experiment to know what is wrong

sprintell avatar Sep 11 '20 13:09 sprintell

@tudorgroza , @JalMacArthur , @buniello

FROM MY FINDINGS: This problem was created by too many flags been monitored unnecessarily e.g isPublished, UnpublishDate, unPublishReason, currentCurationStatus, newStatus creating Code smell as they are all checking thesame thing (whether the study is published or not) in different places, making it problematic to move studies from one state to another, while they are good to be stored in the database but not all of them should be used as condition to determine what happens to a study

In this situation:

1.) When curator select "Publish Study" from drop down and click on "Save Status" button for this publication (https://www.ebi.ac.uk/gwas/curation/publication/32814899) on the user interface, there is nothing happening in the database for any of the studies

In the back end when the "assignStudyStatus" method is called on this line of code https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/service/StudyOperationsService.java#L116

All the studies to be updated fails the condition (newStatus != currentStudyStatus)

Why ? Because their currentStudyStatus is "Publish Study" and the newStatus is also "Publish Study" as selected by the curator from the drop down.

2.) Secondly, when this happens, the else statements here: https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/service/StudyOperationsService.java#L137

sets the message report to be message = "Current status and new status are the same, no change required";
The message is returned to the calling controller which initiated the back end operation on this line : https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/controller/PublicationController.java#L226

but funnily, the returned error string is never reported to the curator, but rather the "studyIds" that the curator sent initially was just aggregated into a map with a status "updated" as shown here: https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/controller/PublicationController.java#L227

and returned back to the front end as it is, hence the reason why a user interface pop up keeps saying all the studies are updated even though nothing happened .

3.) If Annalisa decides to change the curation status to something else other than "Publish Study" on a 1st pass, it will definitely escape that initial problem, but another conditional check waits in final step that does the database UPDATE ON THIS LINE: https://github.com/EBISPOT/goci/blob/07f0735ac6d44888665ed6029725b99046008518/goci-interfaces/goci-curation/src/main/java/uk/ac/ebi/spot/goci/curation/service/StudyOperationsService.java#L281

This condition check shows that for any study that has its curationStatus as "Publish Study" If it's Housekeeping isPublished status is "true", its UnpublishDate and UnpublishReason can never be changed i.e Step 4 and step 5 listed by Tudor will never be implemented.

I think this last conditional check is unnecessary and when removed. the update should go through.

sprintell avatar Sep 11 '20 15:09 sprintell

thank you, @sprintell

buniello avatar Sep 16 '20 15:09 buniello

A couple of known and unknown ORACLE TRIGGERS scripts controlling the houseKeeping table, interferring and overriding developer's solution to this problem, @jhayhurst will manually change the necessary attributes for the study with id: 62304271 as was done for the others, since its the only one left in that unwanted state. to avoid spending too much developer's time on it

sprintell avatar Sep 18 '20 08:09 sprintell

The two statements below yield exactly the same id's, yet the pruning step performs (2), which is more opaque. So it would seem that we can modify the pruning script to prune those that satisfy (1). This will be easier to read and allow us to circumvent the problem where some studies have an unpublished date even though they are published.

// 1. if is NOT published
select S.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published != 1; 

// 2. if unpublished date NOT null or published date IS null 
select s.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
and (hk.catalog_unpublish_date is not NULL
or hk.catalog_publish_date is NULL); 

@sprintell and @ljwh2 - I'll let you decide whether you want the script to be modified - it should be easy, but we may want to consider why it was written in that way in the first place (unforeseen consequences)?

jdhayhurst avatar Sep 23 '20 15:09 jdhayhurst

@sprintell To identify studies in the db that are in both published and unpublished states:

select * from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published = 1 
AND hk.catalog_unpublish_date is not NULL;

To correctly assign these studies as 'published':

update housekeeping
set housekeeping.unpublish_reason_id = NULL, housekeeping.catalog_unpublish_date = NULL
where id in (
select HK.id from study S, housekeeping HK
where S.HOUSEKEEPING_ID = HK.ID
AND hk.is_published = 1 
AND hk.catalog_unpublish_date is not NULL);

jdhayhurst avatar Jan 11 '21 10:01 jdhayhurst

found another study falling into this group: GCST90012102 @jdhayhurst @sprintell this one cannot be published. blocked in the "unpublished" status.

buniello avatar Jan 12 '21 13:01 buniello

For clarity, GCST90012102 is different in that it appears in the database as 'unpublished' - it's not in both states. However, it may be a victim of a related issue because it is not possible to publish this study from the curation app. When attempted, it appeared as though it was published from the 'publication view' but on the study view it was unpublished. In the housekeeping table in the database it always to be unpublished. Had to manually set this study to published:

update housekeeping
set housekeeping.unpublish_reason_id = NULL, 
    housekeeping.catalog_unpublish_date = NULL, 
    housekeeping.is_published = 1
where housekeeping.id = <housekeepingid>;

jdhayhurst avatar Jan 12 '21 15:01 jdhayhurst

I don't know if this is still an issue

jdhayhurst avatar Jul 26 '23 15:07 jdhayhurst