drawProteins
drawProteins copied to clipboard
How to draw protein features not supported by any of the draw_* functions?
Hi @brennanpincardiff,
very useful package!
I have a similar issue as described at https://github.com/brennanpincardiff/drawProteins/issues/13#issuecomment-444105210 trying to find the best solution to plot types currently not supported by any of the draw_*
function.
I'm trying to draw schematics for multiple proteins and I'm currently looking for the best way to draw coiled coil domains (prot_data$type == "COILED"
) and compositional bias regions (prot_data$type == "COMPBIAS"
).
My prot_data
frame looks as follows:
> my.prot_data
type description begin end length accession entryName taxid order
1 CHAIN PF3D7_0530300 1 1446 1445 C0H4G8 C0H4G8_PLAF7 36329 1
2 TRANSMEM Helical 20 39 19 C0H4G8 C0H4G8_PLAF7 36329 1
3 TRANSMEM Helical 91 115 24 C0H4G8 C0H4G8_PLAF7 36329 1
4 TRANSMEM Helical 1422 1441 19 C0H4G8 C0H4G8_PLAF7 36329 1
5 REGION Disordered 568 599 31 C0H4G8 C0H4G8_PLAF7 36329 1
6 REGION Disordered 611 648 37 C0H4G8 C0H4G8_PLAF7 36329 1
7 COILED NONE 328 348 20 C0H4G8 C0H4G8_PLAF7 36329 1
8 TRANSMEM Helical 779 805 26 C0H4G8 C0H4G8_PLAF7 36329 1
9 TRANSMEM Helical 857 880 23 C0H4G8 C0H4G8_PLAF7 36329 1
10 TRANSMEM Helical 886 906 20 C0H4G8 C0H4G8_PLAF7 36329 1
11 TRANSMEM Helical 1252 1272 20 C0H4G8 C0H4G8_PLAF7 36329 1
12 TRANSMEM Helical 1292 1314 22 C0H4G8 C0H4G8_PLAF7 36329 1
13 TRANSMEM Helical 1326 1343 17 C0H4G8 C0H4G8_PLAF7 36329 1
14 TRANSMEM Helical 1363 1381 18 C0H4G8 C0H4G8_PLAF7 36329 1
15 TRANSMEM Helical 1393 1416 23 C0H4G8 C0H4G8_PLAF7 36329 1
16 CHAIN PF3D7_0415800 1 875 874 Q8I1S9 Q8I1S9_PLAF7 36329 2
17 REGION Disordered 560 611 51 Q8I1S9 Q8I1S9_PLAF7 36329 2
18 COMPBIAS Polar 560 599 39 Q8I1S9 Q8I1S9_PLAF7 36329 2
19 DOMAIN RING-type 79 117 38 Q8I1S9 Q8I1S9_PLAF7 36329 2
20 CHAIN PF3D7_0508900 1 3134 3133 Q8I414 Q8I414_PLAF7 36329 3
21 COILED NONE 3073 3093 20 Q8I414 Q8I414_PLAF7 36329 3
22 COMPBIAS Polar 728 745 17 Q8I414 Q8I414_PLAF7 36329 3
23 COMPBIAS Polyampholyte 746 794 48 Q8I414 Q8I414_PLAF7 36329 3
24 COMPBIAS Polyampholyte 931 954 23 Q8I414 Q8I414_PLAF7 36329 3
25 COMPBIAS Polyampholyte 1739 1759 20 Q8I414 Q8I414_PLAF7 36329 3
26 COMPBIAS Polar 1760 1799 39 Q8I414 Q8I414_PLAF7 36329 3
27 COMPBIAS Acidic 2487 2771 284 Q8I414 Q8I414_PLAF7 36329 3
28 REGION Disordered 817 844 27 Q8I414 Q8I414_PLAF7 36329 3
29 REGION Disordered 931 965 34 Q8I414 Q8I414_PLAF7 36329 3
30 REGION Disordered 1739 1801 62 Q8I414 Q8I414_PLAF7 36329 3
31 REGION Disordered 2335 2371 36 Q8I414 Q8I414_PLAF7 36329 3
32 REGION Disordered 2476 2771 295 Q8I414 Q8I414_PLAF7 36329 3
33 COILED NONE 660 680 20 Q8I414 Q8I414_PLAF7 36329 3
34 COILED NONE 862 882 20 Q8I414 Q8I414_PLAF7 36329 3
35 COILED NONE 1520 1540 20 Q8I414 Q8I414_PLAF7 36329 3
36 COILED NONE 2875 2895 20 Q8I414 Q8I414_PLAF7 36329 3
37 REGION Disordered 714 797 83 Q8I414 Q8I414_PLAF7 36329 3
38 CHAIN PF3D7_1229300 1 990 989 Q8I5C6 Q8I5C6_PLAF7 36329 4
39 REGION Disordered 83 106 23 Q8I5C6 Q8I5C6_PLAF7 36329 4
40 REGION Disordered 333 355 22 Q8I5C6 Q8I5C6_PLAF7 36329 4
41 REGION Disordered 429 453 24 Q8I5C6 Q8I5C6_PLAF7 36329 4
42 REGION Disordered 751 771 20 Q8I5C6 Q8I5C6_PLAF7 36329 4
43 COMPBIAS Polyampholyte 38 58 20 Q8I5C6 Q8I5C6_PLAF7 36329 4
44 COMPBIAS Polyampholyte 86 105 19 Q8I5C6 Q8I5C6_PLAF7 36329 4
45 REGION Disordered 38 71 33 Q8I5C6 Q8I5C6_PLAF7 36329 4
46 CHAIN PF3D7_0822900 1 1176 1175 Q8IB63 Q8IB63_PLAF7 36329 5
47 COMPBIAS Acidic 266 372 106 Q8IB63 Q8IB63_PLAF7 36329 5
48 COMPBIAS Polar 373 417 44 Q8IB63 Q8IB63_PLAF7 36329 5
49 REGION Disordered 976 995 19 Q8IB63 Q8IB63_PLAF7 36329 5
50 REGION Disordered 1010 1032 22 Q8IB63 Q8IB63_PLAF7 36329 5
51 COILED NONE 7 30 23 Q8IB63 Q8IB63_PLAF7 36329 5
52 COMPBIAS Basic 55 69 14 Q8IB63 Q8IB63_PLAF7 36329 5
53 COMPBIAS Polyampholyte 70 91 21 Q8IB63 Q8IB63_PLAF7 36329 5
54 COMPBIAS Polar 92 173 81 Q8IB63 Q8IB63_PLAF7 36329 5
55 COMPBIAS Polyampholyte 175 196 21 Q8IB63 Q8IB63_PLAF7 36329 5
56 COMPBIAS Basic 197 214 17 Q8IB63 Q8IB63_PLAF7 36329 5
57 COMPBIAS Polyampholyte 235 257 22 Q8IB63 Q8IB63_PLAF7 36329 5
58 REGION Disordered 53 425 372 Q8IB63 Q8IB63_PLAF7 36329 5
59 CHAIN PF3D7_1318700 1 749 748 Q8IEC9 Q8IEC9_PLAF7 36329 6
60 REGION Disordered 705 749 44 Q8IEC9 Q8IEC9_PLAF7 36329 6
61 COILED NONE 232 259 27 Q8IEC9 Q8IEC9_PLAF7 36329 6
62 COILED NONE 274 332 58 Q8IEC9 Q8IEC9_PLAF7 36329 6
63 COILED NONE 432 466 34 Q8IEC9 Q8IEC9_PLAF7 36329 6
64 COILED NONE 495 515 20 Q8IEC9 Q8IEC9_PLAF7 36329 6
65 COILED NONE 562 600 38 Q8IEC9 Q8IEC9_PLAF7 36329 6
66 COMPBIAS Polar 385 412 27 Q8IEC9 Q8IEC9_PLAF7 36329 6
67 REGION Disordered 385 415 30 Q8IEC9 Q8IEC9_PLAF7 36329 6
68 CHAIN PF3D7_1312800 1 2361 2360 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
69 COILED NONE 1001 1028 27 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
70 REGION Disordered 148 195 47 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
71 COMPBIAS Polyampholyte 61 87 26 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
72 COMPBIAS Polyampholyte 148 185 37 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
73 COMPBIAS Polyampholyte 1242 1315 73 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
74 COMPBIAS Polyampholyte 1646 1685 39 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
75 COMPBIAS Polar 1686 1718 32 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
76 COMPBIAS Polyampholyte 1719 1736 17 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
77 COMPBIAS Polyampholyte 1935 1969 34 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
78 COMPBIAS Acidic 1970 2017 47 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
79 COMPBIAS Polyampholyte 2046 2064 18 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
80 COMPBIAS Polar 2065 2109 44 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
81 COMPBIAS Polyampholyte 2110 2177 67 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
82 COMPBIAS Polar 2178 2194 16 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
83 COMPBIAS Polyampholyte 2195 2245 50 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
84 REGION Disordered 1229 1315 86 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
85 REGION Disordered 1404 1436 32 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
86 REGION Disordered 1638 1753 115 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
87 REGION Disordered 1786 1813 27 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
88 REGION Disordered 1935 2252 317 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
89 REGION Disordered 2341 2361 20 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
90 COILED NONE 282 302 20 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
91 COILED NONE 433 453 20 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
92 REGION Disordered 61 92 31 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
93 CHAIN PF3D7_0308300 1 337 336 O77324 O77324_PLAF7 36329 8
I tried to use the following to draw coiled coil domains (which works):
### add COILED block in blue
p <- p + ggplot2::geom_rect(data = my.prot_data[my.prot_data$type == "COILED",],
mapping=ggplot2::aes(xmin=begin,
xmax=end,
ymin=order-0.2,
ymax=order+0.2),
fill = "blue")
p
Yet, I'm currently not sure what the best way is to add coiled coils to the legend?
Alternatively, I think I could just (manually) define coiled coils as domain types and maybe compositional bias as region type?!
I would be very happy about feedback and suggestions.
Many thanks in advance!
Hi Jan, Thanks for your issues. I've spent some time having a play with your data and finally merged some pull requests that were supplied by @daniel-wells [Hat tip and thanks to Daniel] One of these two pull requests gives chains even when chain is not in Uniprot and the other allows an easier addition of custom domains. I've had a bit of play and written some code which is below. It's not perfect but I think it is going in the direction you want. Please have a look and see what you think. I'm happy to continue the discussion here. Best wishes, Paul
library(devtools)
install_github("brennanpincardiff/drawProteins")
drawProteins::get_features("C0H4G8") -> prot_1_json
drawProteins::feature_to_dataframe(prot_1_json) -> prot_1_data
# make protein schematic for single protein...
p <- draw_canvas(prot_1_data)
p <- draw_chains(p, prot_1_data)
p <- draw_domains(p, prot_1_data, type = "TRANSMEM")
# add "COILED"
p <- draw_domains(p, prot_1_data, type = "COILED")
p
# mmm, no description but there is a legend...
# try protein schematic for multiple protein and see what happens...
prot <- "C0H4G8 Q8I1S9 Q8I414 Q8I5C6 Q8IB63 Q8IEC9 Q8IEJ4"
prot_json <- drawProteins::get_features(prot)
prot_data <- drawProteins::feature_to_dataframe(prot_json)
p <- draw_canvas(prot_data)
p <- draw_chains(p, prot_data)
p <- draw_regions(p, prot_data)
# add "COILED"
p <- draw_domains(p, prot_data, label_domains = FALSE, type = "COILED")
# add "COMPBIAS"
p <- draw_domains(p, prot_data, label_domains = FALSE, type = "COMPBIAS")
p
This is the output for the second multiple protein image...
