hyphy icon indicating copy to clipboard operation
hyphy copied to clipboard

Foreground vs background in BUSTED

Open katherine-li07 opened this issue 2 years ago • 6 comments

Hello!

I have a question about BUSTED that I am hoping you can help with.

My dataset is a population of SARS-CoV-2 viruses that has been aligned to the original Wuhan reference sequence and trimmed to the individual genes.

When running BUSTED, would it be correct to set the Wuhan reference sequence as background while leaving the rest of the dataset as foreground branches for testing? I am unsure whether setting the reference sequence as background means it is completely ignored (and testing only involves the foreground), or whether it means that selection is determined for the foreground branches relative to the background.

Alternatively, would it be more appropriate to select the entire dataset as foreground and not specify any background branches if my potential background (the Wuhan reference) is only a single sequence?

Thank you!

katherine-li07 avatar Sep 14 '22 19:09 katherine-li07

Dear @katherine-li07,

My standard recommendation here (viral isolates, where one sequence = one patient) is to run BUSTED on internal branches only (from https://pubmed.ncbi.nlm.nih.gov/26814962/)

image

Also, one sequence should not affect such an analysis in most cases, and, anyway, excluding all terminal branches will also exclude the Wu-1 reference.

Hope this helps, Sergei

spond avatar Sep 14 '22 19:09 spond

Great, thank you for your help!

katherine-li07 avatar Sep 15 '22 16:09 katherine-li07

Dear @katherine-li07,

Happy to help. Let me know how it goes. For SC-2 data, you should also remove all identical sequences (HyPhy will warn you if you have those), because they will make the analysis to run slower and not contribute any signal to the test.

See https://github.com/veg/hyphy-analyses/tree/master/remove-duplicates

Best, Sergei

spond avatar Sep 15 '22 16:09 spond

Hi Sergei,

I am running on the Datamonkey web server, which I believe already removes duplicate sequences automatically.

I have been able to run all of my sequences without any trouble except for one, which seems to be stuck somewhere. It says "Could not contact the server for job status updates" and has disappeared from the queue, but still appears to be running. Ticket number 2332809.new-silverback. Do you know how I can resolve this?

Thanks! Katherine

katherine-li07 avatar Sep 21 '22 15:09 katherine-li07

Dear @katherine-li07,

The job completed successfully, but there must have been a communication error within our site. I can either email the results to you or place them here based on your wishes.

If you would like the results emailed, please contact me at [email protected].

Best, Steven

stevenweaver avatar Sep 21 '22 16:09 stevenweaver

Thank you Steven and Sergei for your help.

I am also wondering how BUSTED results relate to FUBAR results. I understand that FUBAR reports positive and negative selection at individual sites, whereas BUSTED reports positive selection across an entire gene, but should these results generally mimic each other for where selection is reported?

For example, I have run the same files through both BUSTED and FUBAR, and in some cases FUBAR reported multiple sites under positive selection, but BUSTED did not detect selection for the gene.

In this case, I am wondering if it is appropriate to compare the results of these two tools, or if I should be sticking to one over the other.

Thank you! Katherine

katherine-li07 avatar Sep 26 '22 15:09 katherine-li07

Stale issue message

github-actions[bot] avatar Nov 26 '22 00:11 github-actions[bot]