methylpy icon indicating copy to clipboard operation
methylpy copied to clipboard

add-methylation level not working

Open knoedlerj opened this issue 4 years ago • 12 comments

When I run add-methylation-level with an input tsv (genes that are differentially expressed, for which I'm interested in getting total methylation level) methylpy generates a blank output file.

knoedlerj avatar Jan 14 '20 15:01 knoedlerj

Do you mind to share the top few lines of your input file?

yupenghe avatar Jan 15 '20 08:01 yupenghe

Sure thing! The input tsv file is starts with:

chr1 59764278 59878081 NM_007561 0 + chr1 193301993 193343878 NM_008484 0 + chr1 42695767 42700209 NM_008900 0 + chr1 95587681 95667594 NM_009183 0 - chr1 84036292 84284645 NM_001003948 0 - chr1 93478992 93509732 NM_010891 0 +

knoedlerj avatar Jan 15 '20 15:01 knoedlerj

The format looks fine. There are a few possible causes.

  • I notice that you are using "chr1". Please make sure the chromosome in allc file is named in the same way.
  • Please double check that the input file is tab-separated.
  • You will need to add a header to the file; otherwise the first line of the file will be treated as header.

yupenghe avatar Jan 16 '20 02:01 yupenghe

Thanks, that seems to have worked! However, now only some of the entries actually get their methylation levels calculated - currently trying to figure out why.

knoedlerj avatar Jan 20 '20 22:01 knoedlerj

Update - it's only calculating levels for about 10% of the intervals listed and nothing obvious seems different about those intervals (these samples have about 25x coverage so there should be information on most of them). Has this behavior been reported before?

knoedlerj avatar Jan 23 '20 22:01 knoedlerj

I don't think so. It will be great help for me to debug if you can share a subset of the data for reproducing this issue.

yupenghe avatar Jan 24 '20 02:01 yupenghe

Can do - I can supply a subset of one of the allc files and the tsv. Even the reduced allc is pretty big (414 MB compressed) - how would you like me to send it? Thank you very much for your assistance!

knoedlerj avatar Jan 24 '20 19:01 knoedlerj

If you can set up a link for me to download the data, it will be fine. FTP, google drive etc will work for me.

yupenghe avatar Jan 25 '20 01:01 yupenghe

Thanks! Try this: https://drive.google.com/open?id=1bUIvMYR-aopcMwoQEttaYlx8UrRrOZ00

knoedlerj avatar Jan 28 '20 01:01 knoedlerj

Thanks. I think the problem is that the input tsv file is not sorted. You can use the below command to sort the file.

head -n 1 POA_allpairwisegenes.tsv > header
tail -n +2 POA_allpairwisegenes.tsv|sort -k 1,1 -k 2,2g -k 3,3g |cat header - > POA_allpairwisegenes.reformatted.tsv
rm header

The problem should be solved with the sorted file. Please let me know if it works.

yupenghe avatar Jan 28 '20 03:01 yupenghe

Looks like it worked, thank you!! Now to figure out what it all means . .

knoedlerj avatar Jan 29 '20 18:01 knoedlerj

I have a similar issue. But my output tsv file has only the bed file This is the test bed file chromosome start end 2 1 40001 2 40001 80001 2 80001 120001 2 120001 160001 This is the output I get chromosome start end methylation_level_ACA methylation_level_ACB methylation_level_pCMT3-RNAiA methylation_level_pCMT3-RNAiB 2 1 40001
2 40001 80001
2 80001 120001
2 120001 160001

I checked that my files are using proper tab as delimiter and the allc files are not empty.... 2 1647 - CAT 2 20 1 2 1649 + CGT 10 12 1 2 1650 - CGT 16 21 1 2 1653 + CCT 0 12 0

coralzhang avatar Apr 07 '20 02:04 coralzhang