drawProteins icon indicating copy to clipboard operation
drawProteins copied to clipboard

How to draw protein features not supported by any of the draw_* functions?

Open janstrauss1 opened this issue 4 years ago • 1 comments

Hi @brennanpincardiff,

very useful package!

I have a similar issue as described at https://github.com/brennanpincardiff/drawProteins/issues/13#issuecomment-444105210 trying to find the best solution to plot types currently not supported by any of the draw_* function.

I'm trying to draw schematics for multiple proteins and I'm currently looking for the best way to draw coiled coil domains (prot_data$type == "COILED") and compositional bias regions (prot_data$type == "COMPBIAS").

My prot_data frame looks as follows:

> my.prot_data
       type   description begin  end length accession    entryName taxid order
1     CHAIN PF3D7_0530300     1 1446   1445    C0H4G8 C0H4G8_PLAF7 36329     1
2  TRANSMEM       Helical    20   39     19    C0H4G8 C0H4G8_PLAF7 36329     1
3  TRANSMEM       Helical    91  115     24    C0H4G8 C0H4G8_PLAF7 36329     1
4  TRANSMEM       Helical  1422 1441     19    C0H4G8 C0H4G8_PLAF7 36329     1
5    REGION    Disordered   568  599     31    C0H4G8 C0H4G8_PLAF7 36329     1
6    REGION    Disordered   611  648     37    C0H4G8 C0H4G8_PLAF7 36329     1
7    COILED          NONE   328  348     20    C0H4G8 C0H4G8_PLAF7 36329     1
8  TRANSMEM       Helical   779  805     26    C0H4G8 C0H4G8_PLAF7 36329     1
9  TRANSMEM       Helical   857  880     23    C0H4G8 C0H4G8_PLAF7 36329     1
10 TRANSMEM       Helical   886  906     20    C0H4G8 C0H4G8_PLAF7 36329     1
11 TRANSMEM       Helical  1252 1272     20    C0H4G8 C0H4G8_PLAF7 36329     1
12 TRANSMEM       Helical  1292 1314     22    C0H4G8 C0H4G8_PLAF7 36329     1
13 TRANSMEM       Helical  1326 1343     17    C0H4G8 C0H4G8_PLAF7 36329     1
14 TRANSMEM       Helical  1363 1381     18    C0H4G8 C0H4G8_PLAF7 36329     1
15 TRANSMEM       Helical  1393 1416     23    C0H4G8 C0H4G8_PLAF7 36329     1
16    CHAIN PF3D7_0415800     1  875    874    Q8I1S9 Q8I1S9_PLAF7 36329     2
17   REGION    Disordered   560  611     51    Q8I1S9 Q8I1S9_PLAF7 36329     2
18 COMPBIAS         Polar   560  599     39    Q8I1S9 Q8I1S9_PLAF7 36329     2
19   DOMAIN     RING-type    79  117     38    Q8I1S9 Q8I1S9_PLAF7 36329     2
20    CHAIN PF3D7_0508900     1 3134   3133    Q8I414 Q8I414_PLAF7 36329     3
21   COILED          NONE  3073 3093     20    Q8I414 Q8I414_PLAF7 36329     3
22 COMPBIAS         Polar   728  745     17    Q8I414 Q8I414_PLAF7 36329     3
23 COMPBIAS Polyampholyte   746  794     48    Q8I414 Q8I414_PLAF7 36329     3
24 COMPBIAS Polyampholyte   931  954     23    Q8I414 Q8I414_PLAF7 36329     3
25 COMPBIAS Polyampholyte  1739 1759     20    Q8I414 Q8I414_PLAF7 36329     3
26 COMPBIAS         Polar  1760 1799     39    Q8I414 Q8I414_PLAF7 36329     3
27 COMPBIAS        Acidic  2487 2771    284    Q8I414 Q8I414_PLAF7 36329     3
28   REGION    Disordered   817  844     27    Q8I414 Q8I414_PLAF7 36329     3
29   REGION    Disordered   931  965     34    Q8I414 Q8I414_PLAF7 36329     3
30   REGION    Disordered  1739 1801     62    Q8I414 Q8I414_PLAF7 36329     3
31   REGION    Disordered  2335 2371     36    Q8I414 Q8I414_PLAF7 36329     3
32   REGION    Disordered  2476 2771    295    Q8I414 Q8I414_PLAF7 36329     3
33   COILED          NONE   660  680     20    Q8I414 Q8I414_PLAF7 36329     3
34   COILED          NONE   862  882     20    Q8I414 Q8I414_PLAF7 36329     3
35   COILED          NONE  1520 1540     20    Q8I414 Q8I414_PLAF7 36329     3
36   COILED          NONE  2875 2895     20    Q8I414 Q8I414_PLAF7 36329     3
37   REGION    Disordered   714  797     83    Q8I414 Q8I414_PLAF7 36329     3
38    CHAIN PF3D7_1229300     1  990    989    Q8I5C6 Q8I5C6_PLAF7 36329     4
39   REGION    Disordered    83  106     23    Q8I5C6 Q8I5C6_PLAF7 36329     4
40   REGION    Disordered   333  355     22    Q8I5C6 Q8I5C6_PLAF7 36329     4
41   REGION    Disordered   429  453     24    Q8I5C6 Q8I5C6_PLAF7 36329     4
42   REGION    Disordered   751  771     20    Q8I5C6 Q8I5C6_PLAF7 36329     4
43 COMPBIAS Polyampholyte    38   58     20    Q8I5C6 Q8I5C6_PLAF7 36329     4
44 COMPBIAS Polyampholyte    86  105     19    Q8I5C6 Q8I5C6_PLAF7 36329     4
45   REGION    Disordered    38   71     33    Q8I5C6 Q8I5C6_PLAF7 36329     4
46    CHAIN PF3D7_0822900     1 1176   1175    Q8IB63 Q8IB63_PLAF7 36329     5
47 COMPBIAS        Acidic   266  372    106    Q8IB63 Q8IB63_PLAF7 36329     5
48 COMPBIAS         Polar   373  417     44    Q8IB63 Q8IB63_PLAF7 36329     5
49   REGION    Disordered   976  995     19    Q8IB63 Q8IB63_PLAF7 36329     5
50   REGION    Disordered  1010 1032     22    Q8IB63 Q8IB63_PLAF7 36329     5
51   COILED          NONE     7   30     23    Q8IB63 Q8IB63_PLAF7 36329     5
52 COMPBIAS         Basic    55   69     14    Q8IB63 Q8IB63_PLAF7 36329     5
53 COMPBIAS Polyampholyte    70   91     21    Q8IB63 Q8IB63_PLAF7 36329     5
54 COMPBIAS         Polar    92  173     81    Q8IB63 Q8IB63_PLAF7 36329     5
55 COMPBIAS Polyampholyte   175  196     21    Q8IB63 Q8IB63_PLAF7 36329     5
56 COMPBIAS         Basic   197  214     17    Q8IB63 Q8IB63_PLAF7 36329     5
57 COMPBIAS Polyampholyte   235  257     22    Q8IB63 Q8IB63_PLAF7 36329     5
58   REGION    Disordered    53  425    372    Q8IB63 Q8IB63_PLAF7 36329     5
59    CHAIN PF3D7_1318700     1  749    748    Q8IEC9 Q8IEC9_PLAF7 36329     6
60   REGION    Disordered   705  749     44    Q8IEC9 Q8IEC9_PLAF7 36329     6
61   COILED          NONE   232  259     27    Q8IEC9 Q8IEC9_PLAF7 36329     6
62   COILED          NONE   274  332     58    Q8IEC9 Q8IEC9_PLAF7 36329     6
63   COILED          NONE   432  466     34    Q8IEC9 Q8IEC9_PLAF7 36329     6
64   COILED          NONE   495  515     20    Q8IEC9 Q8IEC9_PLAF7 36329     6
65   COILED          NONE   562  600     38    Q8IEC9 Q8IEC9_PLAF7 36329     6
66 COMPBIAS         Polar   385  412     27    Q8IEC9 Q8IEC9_PLAF7 36329     6
67   REGION    Disordered   385  415     30    Q8IEC9 Q8IEC9_PLAF7 36329     6
68    CHAIN PF3D7_1312800     1 2361   2360    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
69   COILED          NONE  1001 1028     27    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
70   REGION    Disordered   148  195     47    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
71 COMPBIAS Polyampholyte    61   87     26    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
72 COMPBIAS Polyampholyte   148  185     37    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
73 COMPBIAS Polyampholyte  1242 1315     73    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
74 COMPBIAS Polyampholyte  1646 1685     39    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
75 COMPBIAS         Polar  1686 1718     32    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
76 COMPBIAS Polyampholyte  1719 1736     17    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
77 COMPBIAS Polyampholyte  1935 1969     34    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
78 COMPBIAS        Acidic  1970 2017     47    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
79 COMPBIAS Polyampholyte  2046 2064     18    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
80 COMPBIAS         Polar  2065 2109     44    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
81 COMPBIAS Polyampholyte  2110 2177     67    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
82 COMPBIAS         Polar  2178 2194     16    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
83 COMPBIAS Polyampholyte  2195 2245     50    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
84   REGION    Disordered  1229 1315     86    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
85   REGION    Disordered  1404 1436     32    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
86   REGION    Disordered  1638 1753    115    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
87   REGION    Disordered  1786 1813     27    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
88   REGION    Disordered  1935 2252    317    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
89   REGION    Disordered  2341 2361     20    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
90   COILED          NONE   282  302     20    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
91   COILED          NONE   433  453     20    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
92   REGION    Disordered    61   92     31    Q8IEJ4 Q8IEJ4_PLAF7 36329     7
93    CHAIN PF3D7_0308300     1  337    336    O77324 O77324_PLAF7 36329     8

I tried to use the following to draw coiled coil domains (which works):

### add COILED block in blue
p <- p + ggplot2::geom_rect(data = my.prot_data[my.prot_data$type == "COILED",],
                            mapping=ggplot2::aes(xmin=begin,
                                                 xmax=end,
                                                 ymin=order-0.2,
                                                 ymax=order+0.2),
                            fill = "blue")
p

Yet, I'm currently not sure what the best way is to add coiled coils to the legend?

Alternatively, I think I could just (manually) define coiled coils as domain types and maybe compositional bias as region type?!

I would be very happy about feedback and suggestions.

Many thanks in advance!

janstrauss1 avatar Sep 15 '20 17:09 janstrauss1

Hi Jan, Thanks for your issues. I've spent some time having a play with your data and finally merged some pull requests that were supplied by @daniel-wells [Hat tip and thanks to Daniel] One of these two pull requests gives chains even when chain is not in Uniprot and the other allows an easier addition of custom domains. I've had a bit of play and written some code which is below. It's not perfect but I think it is going in the direction you want. Please have a look and see what you think. I'm happy to continue the discussion here. Best wishes, Paul

library(devtools)
install_github("brennanpincardiff/drawProteins")

drawProteins::get_features("C0H4G8") -> prot_1_json

drawProteins::feature_to_dataframe(prot_1_json) -> prot_1_data

# make protein schematic for single protein...
p <- draw_canvas(prot_1_data)
p <- draw_chains(p, prot_1_data)
p <- draw_domains(p, prot_1_data, type = "TRANSMEM")
# add "COILED"
p <- draw_domains(p, prot_1_data, type = "COILED")
p
# mmm, no description but there is a legend...


# try protein schematic for multiple protein and see what happens...
prot <- "C0H4G8 Q8I1S9 Q8I414 Q8I5C6 Q8IB63 Q8IEC9 Q8IEJ4"
prot_json <- drawProteins::get_features(prot)
prot_data <- drawProteins::feature_to_dataframe(prot_json)

p <- draw_canvas(prot_data)
p <- draw_chains(p, prot_data)
p <- draw_regions(p, prot_data)
# add "COILED"
p <- draw_domains(p, prot_data, label_domains = FALSE, type = "COILED")
# add "COMPBIAS"
p <- draw_domains(p, prot_data, label_domains = FALSE, type = "COMPBIAS")
p

This is the output for the second multiple protein image...

Screenshot 2020-09-21 at 22 49 45

brennanpincardiff avatar Sep 21 '20 21:09 brennanpincardiff