forestploter icon indicating copy to clipboard operation
forestploter copied to clipboard

x axis ticks: wrong value and collision

Open agilly opened this issue 2 years ago • 13 comments
trafficstars

For some inputs, the forest function adds erroneous/duplicate values. See the MRE below:

in2=data.frame(effect=-0.08, lci_effect=-0.15, uci_effect=-0.03)
input=data.frame(Exposure="analysis name", Outcome="outcome name", `Beta (95% CI), Clinical Units`="confidence interval", `custom field`="custom field", prop="prop", ` `="                                                               ", P=0.3455)
forestploter::forest(input, est = in2$effect, lower = in2$lci_effect, upper = in2$uci_effect, sizes = 1, ci_column = 6, ref_line = 0, xlab = "x axis title, quite long")
image

This produces an x axis with a duplicated -0.1 value. This becomes obvious if you uncomment the check.overlap=TRUE, line in make_xaxis, you just seee two -0.1 ticks. I think what is plotted above it is the correct "0" x coordinate of the ref_line.

This may be an entirely separate issue but I am also posting here for completeness: In some other cases (no MRE for this as I saw it only in passing), it seems the x axis is plotted twice, you can clearly see the characteristic thickened effect and slight offset of the two lines being drawn over each other.

agilly avatar May 05 '23 16:05 agilly

Hi, I think the problem is caused by auto calculation of x-axis ticks and ticks digits. You can solve this by providing the ticks as following:

forestploter::forest(input, 
                     est = in2$effect, 
                     lower = in2$lci_effect,
                     upper = in2$uci_effect, 
                     sizes = 1,
                     ci_column = 6, 
                     xlim = c(-0.2, 0),
                     ticks_at = c(-0.15, -0.1, -0.05, 0),
                     ticks_digits = 2L, # remove trailing zeros, use 2 if you want to keep it
                     ref_line = 0,
                     xlab = "x axis title, quite long")

Hope this solves your problem.

adayim avatar May 06 '23 09:05 adayim

Thanks, I know I can specify the axis ticks, but this is part of a shiny app that calculates forest plots on demand for arbitrary data. I have no way of pre-determining the ticks.

On Sat, May 6, 2023, 05:07 Alim Dayim @.***> wrote:

Hi, I think the problem is caused by auto calculation of x-axis ticks and ticks digits. You can solve this by providing the ticks as following:

forestploter::forest(input, est = in2$effect, lower = in2$lci_effect, upper = in2$uci_effect, sizes = 1, ci_column = 6, xlim = c(-0.2, 0), ticks_at = c(-0.15, -0.1, -0.05, 0), ticks_digits = 2, ref_line = 0, xlab = "x axis title, quite long")

Hope this solves your problem.

— Reply to this email directly, view it on GitHub https://github.com/adayim/forestploter/issues/42#issuecomment-1537096522, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEWAUQ7TNSD2GCQEPRRTXDXEYIDZANCNFSM6AAAAAAXXJUW7Y . You are receiving this because you authored the thread.Message ID: @.***>

agilly avatar May 06 '23 17:05 agilly

You can try to calculate the ticks_at and ticks_digits in your Shiny app, and pass these to the forest function. This is not an issue related to this package, I don't think I can help you with this.

adayim avatar May 08 '23 16:05 adayim

I think the fact that forestploter generates an axis containing two identical 0.1 values at two different ticks indicates there is a bug in the underlying make_ticks and/or make_xaxis. Similarly, collision between x axis ticks and ref_line label also falls within the realm of "an issue with the package".

However, the documentation does state:

Although many efforts have been made to automatically get a pretty ticks break, it will not give a perfect solution, especially if 'log2' and 'log10' defined for x_trans. Please provide this value if possible.

Which indicates the authors of the package are not prioritizing these particular issues at the moment. Closing as this corresponds to a "will-not-fix" situation.

agilly avatar May 08 '23 18:05 agilly

Hi, I think your point makes sense. It is related to auto calculation of ticks_digits. I have pushed an improvement regarding this. Can you try the development version from Github? Thank you very much.

adayim avatar May 08 '23 21:05 adayim

Nice! This works using the github version, both on the MRE and on the real world example this was derived from. Thanks for the fix!

agilly avatar May 09 '23 14:05 agilly

Reopening this, as it may not be ideal to publish commit d389e56 to release as is. In some cases I now get a very crowded x axis:

image

It may be due to a rounding issue, as shown if I just plot 1 interval: image

If I open the first plot in a vector image editor and look at the axis label, it is actually trying to print 77.40000000000000 120.10000000000000 554.599999999999994. Maybe this could be fixed with a format or a round?

However, I cannot provide a MRE since I am unable to exactly reproduce the steps that lead to this new behavior. It seems to happen on some machines but not others, which is consistent with precision/rounding issues. If I figure it out, I'll let you know.

agilly avatar May 10 '23 15:05 agilly

Yes, it is the rounding issue. The difficult part is to determine how many digits to keep. One may don't want to keep the decimals if all values are integer. I see your reference line is 1, so I assume this is caused by log transformation. I will look into this. Thank you for reporting this.

adayim avatar May 10 '23 15:05 adayim

Thanks for your reply. Yes, this is with the log transformation. The x axis in log scale is particularly difficult, as you say, on the positive side of the x axis the numbers start growing very fast and you may not want a digit. Maybe a rule of thumb could be: anything above 10 will get rounded to an integer?

But this is a wider issue that has always made x axis calculation very difficult. In general on an x axis you don't want exact values even if they are an integer. So if the tick is 476 you probably want 450 or 400 instead. So there is rounding taking into account the power of ten, like pw=floor(log10(x));round(x/10^pw)*10^pw...

Also, I have further narrowed down the issue. I was trying to do an MRE for this, but the following input was working fine:

in2=data.frame(effect=14.3478, lci_effect=2.953778, uci_effect=69.69405)

but the same values from a data.table or dataframe read with fread or read.table were giving me the wrong rounding behaviour. I think reading in from a file stores these values with a lot of precision, which then ends up in your x axis calculation. In the meantime I will add a forced rounding of the input to 5 digits, I think that may be a temporary fix:

cn=c("effect", "lci_effect", "uci_effect")
in2[,(cn):=lapply(.SD, function(x) round(x, 2)), .SDcols=cn]

Edit: I have deployed this change to production now and it seems to work. In my case we will never have values for estimator/CI where the digits after the 5th post comma will be important, but I can see how this is not necessarily the case for all situations.

agilly avatar May 10 '23 15:05 agilly

Hi, do you have a minimal example so I can use to tweak the code? The rounding is fine, it works as supposed to, but I think the problem is with the ticks_digits. This was calculated based on the decimal places of values you have passed as in your case you this should be decided by the function in a smart way without having to provide a parameter value. And I think this will problem will disappear if you set ticks_digits=1L

adayim avatar May 10 '23 21:05 adayim

Here is a MRE. I realised that the data that was stored was actually read in correctly with a very high precision from my source. It was then displayed on screen at a lower precision of 4 digits after the comma by print.data.frame, which I didn't realise. So the issue seems to simply be related to having a high precision number as input:

input=data.frame(Exposure="Exposure name", Outcome="Outcome Name", `OR (95% CI)`="46.3 (1.46-67.5)", `Custom field`="value", `Custom field 1`="value", `Custom field 2`="value", ` `="                                                               ", P=0.056)
e=data.frame(effect=14.3478,lci_effect=7.95737804000952,uci_effect=79.6640509172787)
forestploter::forest(input, est = e$effect, lower = e$lci_effect, upper = e$uci_effect, sizes = 1, ci_column = 7, ref_line = 1, xlab = "x axis label", x_trans="log")

In light of this, I wouldn't try to approach this as solving the specific problem of handling very precise inputs, but rather, produce better axis ticks. For example, if I coerce a similar example to 5 digits, the axis displays as follows: image

If we have code that instead of this axis computed based on the input values, returns a more "natural" axis like [1, 5, 10, 50] we are circumventing the problem of rounding inputs entirely.

agilly avatar May 12 '23 14:05 agilly

Hi, it is difficult to calculate the axis in a natural way, unless you can provide me with some algorithm for that. I am using pretty function to produce the axis ticks now. I did some changes to avoid having the decimals if all values are larger than 1, it is far from ideal but this is the best I can do now. The best solution would be provide the ticks_at parameters otherwise this is the best I can for the moment. If you have an idea on how the algorithm should be would be very helpful.

adayim avatar May 13 '23 23:05 adayim

Creating correct ticks in the general case is not straightforward. I tried playing around with axisTicks but it is not convincing.

agilly avatar Jun 07 '23 18:06 agilly