TOBIAS
TOBIAS copied to clipboard
BINDetect not giving out error when the motif file is "deformed"
Might be a continuation of issue #78. When I tried to run BINDetect using "pfm" motif file created by gimmemotifs, i get a problem where
The pfm file looks something like:
>GM.5.0.Sox.0001
0.7213 0.0793 0.1103 0.0891
0.9259 0.0072 0.0062 0.0607
0.0048 0.9203 0.0077 0.0672
0.9859 0.0030 0.0030 0.0081
0.9778 0.0043 0.0128 0.0051
0.1484 0.0050 0.0168 0.8299
>GM.5.0.Homeodomain.0001
0.8870 0.0000 0.0178 0.0951
0.1156 0.2033 0.6629 0.0181
0.0017 0.7452 0.0809 0.1722
0.0011 0.0003 0.0003 0.9983
0.0026 0.0141 0.9721 0.0111
0.0000 0.0189 0.0054 0.9758
0.0006 0.9983 0.0006 0.0006
0.9170 0.0140 0.0046 0.0644
0.2228 0.2421 0.3300 0.2051
0.3621 0.1054 0.2208 0.3116
0.5727 0.0104 0.1741 0.2428
For example, I have 1796 motifs in the pfm file, but I got the following warning:
2023-12-16 10:23:46 (1569572) [INFO] Reading motifs from file
2023-12-16 10:23:47 (1569572) [INFO] - Read 5531 motifs
2023-12-16 10:23:47 (1569572) [WARNING] The motif output names (as given by --naming) are not unique.
2023-12-16 10:23:47 (1569572) [WARNING] The following names occur more than once: ['_']
2023-12-16 10:23:47 (1569572) [WARNING] These motifs will be renamed with '_1', '_2' etc. To prevent this renaming, please make the names of the input --motifs unique
And I got results with the directories named as such:
__1 __1413 __1829 __2243 __2659 __3073 __3489 __541 __957
or
GM.5.0.Sox.0001_GM.5.0.Sox.0001
GM.5.0.Sox.0002_GM.5.0.Sox.0002
GM.5.0.Sox.0003_GM.5.0.Sox.0003
GM.5.0.Sox.0004_GM.5.0.Sox.0004
GM.5.0.Sox.0005_GM.5.0.Sox.0005
GM.5.0.Sox.0006_GM.5.0.Sox.0006
GM.5.0.Sox.0007_GM.5.0.Sox.0007
GM.5.0.Sox.0008_GM.5.0.Sox.0008
GM.5.0.Sox.0009_GM.5.0.Sox.0009
Maybe this pfm file is not a standard pfm file, but maybe it would be nice if BINDetect gives an error that the motif file is not standard.
My current workaround is to run chen2meme
, because it seems that it is a chen motif file. Now BINDetect seems to work fine.