Ublastx_stageone
Ublastx_stageone copied to clipboard
Can you use the 16S copy numbers in the meta_data_online.txt to transform ARG like abundances from another database?
I created a similar pipeline to get abundance of other AMR genes using some of your groups previous methodology of ublast and diamond blastx and obtained absolute counts. Since the sequencing dataset is the same I thought I might be able to normalize the AMR abundance data by the 16S copy numbers identified from ARGS_OAPv2. So I essentially divided the absolute abundance matrix by the 16S copy numbers per sample to obtain a normalized count matrix. I thought this would work well however I see the counts while very similar to ARGS_OAPv2 are two to three orders of magnitude off from my custom database. Please help. Thanks!
For example I used the meta_online_data.txt results and for one sample it said the 16S abundance was 289.7046089 copies from 13014586 reads.
I ran my custom database based on BacMet and for mdtB I got 2420 gene hits for this sample (combined from read 1 and 2).
BAC0646|mdtB|tr|D0ZND9|D0ZND9_SALT1 | BAC0646 | Antimicrobial Efflux | 2420
I divided 2420 by 289.70 to get 8.37 of this gene per 16S copy.
The ARGS_OAP data for mdtB showed an abundance per 16S copy number is 0.082505. Which translates to an absolute abundance (0.082505 * 289.70) of 23.901 copies of mdtB for this particular sample. How can this be possible?
Am I reading the ARG_OAP output files incorrectly?
It is all about your cut off to define the hit. If you do not do systematically simulations, you never know the best parameters. Secondly, the results output by ARGs OAP is an over-optimistic parameter we choose by default for the simulation. It does not necessarily the best for your samples. Regards,
Xiao-Tao Jiang, Ph.D. Postdoc Research Fellow Microbiome Research Centre St George and Sutherland Clinical School UNSW Sydney
Level 2, Clinical Sciences Building (WR Pitney) Short Street, St George Hospital KOGARAH NSW 2217 T: +61 402 943 681 Email: [email protected]/[email protected] MRC web: https://microbiome.org.au/
slvrshot [email protected] 于2019年6月10日周一 上午6:01写道:
For example I used the meta_online_data.txt results and for one sample it said the 16S abundance was 289.7046089 copies from 13014586 reads.
I ran my custom database based on BacMet and for mdtB I got 2420 gene hits for this sample. BAC0646|mdtB|tr|D0ZND9|D0ZND9_SALT1 | BAC0646 | Antimicrobial Efflux | 2420
I divided 2420 by 289.70 to get 8.37 of this gene per 16S copy. Comparing it to ARGS_OAP the mdtB abundance per 16S copy number is 0.082505. Which translates to an absolute abundance (0.082505 * 289.70) of 23.901 copies of mdtB for this particular sample. How can this be possible?
Am I reading the ARG_OAP output files incorrectly? 8.35333621
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/35?email_source=notifications&email_token=AALLDIBFRXPFHXN4U27URHDPZVORZA5CNFSM4HWNKUC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXIRCWQ#issuecomment-500240730, or mute the thread https://github.com/notifications/unsubscribe-auth/AALLDICOUORSYF34ZTQYNTLPZVORZANCNFSM4HWNKUCQ .
Do you mean my e-value? I used 1e-5 in diamond when I ran the blastx, while using the BacMET database as a reference.
I'm not understanding why the discrepancies are so large if my cutoff is the same as the ARGs OAP pipeline. Is there a way to extract the unnormalized data from ARGS OAP or clarify your equation?
My read length is 150 nt length of mdtB is 3123 nt average length of 16S rRNA is 1432 nt N ARG-like sequence is the number of the ARG-like sequence annotated as one specific ARG reference sequence (assuming this is around 3000 since the pipeline uses CARD and RESFINDER???) n is the number of the mapped ARG reference sequence belonging to the ARG type or subtype - I am not sure what this is from ARGS OAP N16S sequence is the number of the 16S sequence identified from the metagenomic data...assuming this is the #of 16S reads (I have 289.7046089 copies from 13014586 reads)
Please help. Thanks!
Hi, it seems that the differences come from the number of reads which can be allocated to those genes that you are interested in the BacMET database. Did you let somebody else test your developed code? If you do have this discrepancy, you need to check your code.
Another thing I need to mention is that whether the gene you are using in the SARG database are the same? If the number of genes associated with the type your interests is different from SARG database and BacMET, how can they be comparable!!
Hi, it seems that the differences come from the number of reads which can be allocated to those genes that you are interested in the BacMET database. Did you let somebody else test your developed code? If you do have this discrepancy, you need to check your code.
Another thing I need to mention is that whether the gene you are using in the SARG database are the same? If the number of genes associated with the type your interests is different from SARG database and BacMET, how can they be comparable!!
Hi again. I understand that directly comparing the databases would be difficult. Still as I indicated in my last email...I'm having trouble understanding how ARGs_OAP is making the 16S normalized ARG abundance calculations. I have tried several times to use the formula in the journal article to back track and obtain the ARG abundance listed in my tables to no avail. Could you please provide an example of using the formula. I sent some files to you yesterday. Thanks!
Which parameters you do not understand? Regards,
Xiao-Tao Jiang, Ph.D. Postdoc Research Fellow Microbiome Research Centre St George and Sutherland Clinical School UNSW Sydney
Level 2, Clinical Sciences Building (WR Pitney) Short Street, St George Hospital KOGARAH NSW 2217 T: +61 402 943 681 Email: [email protected]/[email protected] MRC web: https://microbiome.org.au/
slvrshot [email protected] 于2019年6月13日周四 上午2:17写道:
Hi, it seems that the differences come from the number of reads which can be allocated to those genes that you are interested in the BacMET database. Did you let somebody else test your developed code? If you do have this discrepancy, you need to check your code.
Another thing I need to mention is that whether the gene you are using in the SARG database are the same? If the number of genes associated with the type your interests is different from SARG database and BacMET, how can they be comparable!!
Hi again. I understand that directly comparing the databases would be difficult. Still as I indicated in my last email...I'm having trouble understanding how ARGs_OAP is making the 16S normalized ARG abundance calculations. I have tried several times to use the formula in the journal article to back track and obtain the ARG abundance listed in my tables to no avail. Could you please provide an example of using the formula. I sent some files to you yesterday. Thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/35?email_source=notifications&email_token=AALLDIHWAV3GJ25I4FYXKQ3P2EOSPA5CNFSM4HWNKUC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXQ7ILA#issuecomment-501347372, or mute the thread https://github.com/notifications/unsubscribe-auth/AALLDIHK7RAG4MKGZ6QWDDTP2EOSPANCNFSM4HWNKUCQ .
Hey, I do not receive your files. Can you send me your files again? I will have a check while I have time. Sorry for the later response as I was in several deadlines. Regards,
Xiao-Tao Jiang, Ph.D. Postdoc Research Fellow Microbiome Research Centre St George and Sutherland Clinical School UNSW Sydney
Level 2, Clinical Sciences Building (WR Pitney) Short Street, St George Hospital KOGARAH NSW 2217 T: +61 402 943 681 Email: [email protected]/[email protected] MRC web: https://microbiome.org.au/
XiaoTao Jiang [email protected] 于2019年6月13日周四 上午8:59写道:
Which parameters you do not understand? Regards,
Xiao-Tao Jiang, Ph.D. Postdoc Research Fellow Microbiome Research Centre St George and Sutherland Clinical School UNSW Sydney
Level 2, Clinical Sciences Building (WR Pitney) Short Street, St George Hospital KOGARAH NSW 2217 T: +61 402 943 681 Email: [email protected]/[email protected] MRC web: https://microbiome.org.au/
slvrshot [email protected] 于2019年6月13日周四 上午2:17写道:
Hi, it seems that the differences come from the number of reads which can be allocated to those genes that you are interested in the BacMET database. Did you let somebody else test your developed code? If you do have this discrepancy, you need to check your code.
Another thing I need to mention is that whether the gene you are using in the SARG database are the same? If the number of genes associated with the type your interests is different from SARG database and BacMET, how can they be comparable!!
Hi again. I understand that directly comparing the databases would be difficult. Still as I indicated in my last email...I'm having trouble understanding how ARGs_OAP is making the 16S normalized ARG abundance calculations. I have tried several times to use the formula in the journal article to back track and obtain the ARG abundance listed in my tables to no avail. Could you please provide an example of using the formula. I sent some files to you yesterday. Thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/35?email_source=notifications&email_token=AALLDIHWAV3GJ25I4FYXKQ3P2EOSPA5CNFSM4HWNKUC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXQ7ILA#issuecomment-501347372, or mute the thread https://github.com/notifications/unsubscribe-auth/AALLDIHK7RAG4MKGZ6QWDDTP2EOSPANCNFSM4HWNKUCQ .
@biofuture
I just sent you the files again to both your email addresses.
@slvrshot I had the same problem, Have you solved it?