tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Plotting genome position in Kb and Mb

Open hyanwong opened this issue 2 years ago • 4 comments

At the moment the genome positions in both text and SVG plots are listed in standard coordinates (interpreted as "base pairs"), but often the sequences run to thousands or millions of bases.

I wonder if, by default, if the largest plotted position is > 10,000, we should plot the numbers divided by 1000 (and adjust the X axis label accordingly, perhaps saying "Kb"), and similarly if Mb if the largest plotted position is > 10,000,000. The extra decimal point will take up more space, but I think it would make the positions more readable.

Alternatively, or as well, we could allow explicit labels to be passed, as we do for the Y axis (but not currently for the X)

hyanwong avatar Jun 16 '22 21:06 hyanwong

It is not always the case, though, that the unit of a tree sequence is a "base pair". While that is probably true for inferred tree sequences, it often isn't for simulated ones.

molpopgen avatar Jun 16 '22 22:06 molpopgen

Yes, I had forgotten to say that. There is a potential problem with labelling, although this could be solved like matplotlib does by saying x 1e3 units near the axis (a bit ugly though).

In principle, simulated ones that aren't in base pairs are likely to be from zero to one, I suspect?

hyanwong avatar Jun 16 '22 22:06 hyanwong

In principle, simulated ones that aren't in base pairs are likely to be from zero to one, I suspect?

Not necessarily. I've done things that are 10 "units" long where each unit is one of the [0, 1) things you are thinking of, etc..

molpopgen avatar Jun 16 '22 22:06 molpopgen

Just k or M would avoid having to be specific but still easily understandable I think.

benjeffery avatar Jun 20 '22 13:06 benjeffery