gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Prevent users enabling annotations with mismatching data type (flow etc)

Open gokalpcelik opened this issue 1 year ago • 7 comments

Here is an user trying to enable flow based annotations for non-flow data.

https://gatk.broadinstitute.org/hc/en-us/community/posts/24596063911963-Error-while-running-Mutect2-java-lang-IllegalArgumentException-the-index-points-past-the-last-element-of-the-collection-or-array-334-333

What could be done to prevent this? Possible ideas

  • Check data and disable nonmatching annotations with a warning in the log
  • Prevent running the command at all with error messages in the log
  • Change the way annotations are named such as FB_myannotation...

gokalpcelik avatar Apr 22 '24 17:04 gokalpcelik

@ilyasoifer Looks like this user got tricked by some of the flow based annotations that don't work on their data. I would like to cut down on the risk that this happens for users. If we had more foresight I would advocate renaming all of the flow specific annotations to something like "flowbased_#####". How destructive would this be for your pipelines?

We have some appropriate checks in GATK for the flow-ness of the bam that give warnings more broadly about flow-based mode but we don't currently have any safeguards in the annotations. Thoughts?

jamesemery avatar Apr 23 '24 18:04 jamesemery

@jamesemery, @gokalpcelik - this looks like a bug actually, this annotation should work on any data IMO. Would it be possible to get data to reproduce this crash? We should fix this quickly.

ilyasoifer avatar Apr 23 '24 18:04 ilyasoifer

Looks like some of the code have changed since then line numbers don't match with the master branch. Should we ask user to try running this annotation with the latest GATK?

gokalpcelik avatar Apr 23 '24 19:04 gokalpcelik

@gokalpcelik I see that the bug exists in the updated code too. We can fix it, but would be good to have some dataset that can be used to validate. Any chance to ask the user to generate a small example?

ilyasoifer avatar Apr 23 '24 20:04 ilyasoifer

I asked the user to send us a piece of data. On the other hand the file name starts with SRR6127704. Is this something that we can get from SRA?

gokalpcelik avatar Apr 23 '24 21:04 gokalpcelik

I think even more important is that they share the reference genome that they are using.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Gökalp Çelik @.> Sent: Wednesday, April 24, 2024 12:28:39 AM To: broadinstitute/gatk @.> Cc: Ilya Soifer @.>; Assign @.> Subject: Re: [broadinstitute/gatk] Prevent users enabling annotations with mismatching data type (flow etc) (Issue #8788)

Assigned #8788https://github.com/broadinstitute/gatk/issues/8788 to @ilyasoiferhttps://github.com/ilyasoifer.

— Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/gatk/issues/8788#event-12581899218, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGDPRCP66IKPOBF2GPENP6LY63HAPAVCNFSM6AAAAABGTGMBPWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSGU4DCOBZHEZDCOA. You are receiving this because you were assigned.Message ID: @.***>


CONFIDENTIALITY NOTICE: This message (including any attachments) should be presumed to contain confidential, proprietary, privileged and/or private information. Information contained in this message is intended only for the recipient(s) named above. Any disclosure, reproduction, distribution or other use of this message or any attachments by any unauthorized individual or entity is strictly prohibited. If you have received this message in error, please notify the sender immediately, and delete the message and any attachments. Learn more about Ultima Genomics’ Privacy Policyhttps://www.ultimagenomics.com/privacy-policy.

ilyasoifer avatar Apr 24 '24 04:04 ilyasoifer

@gokalpcelik, @jamesemery - I submitted a PR (#8810) that should be fixing this issue. Could you please take a look? Thanks!

ilyasoifer avatar May 10 '24 09:05 ilyasoifer