orange3 icon indicating copy to clipboard operation
orange3 copied to clipboard

Confidence intervals for regression lines

Open kaimikael opened this issue 3 years ago • 11 comments

What's your use case?

The Scatter Plot widget lets the user add a regression line to the scatter plot. However, there is no indication of the uncertainty of the displayed regression. I would like to be able to display x% confidence intervals.

What's your proposed solution?

A checkbox and number box to add confidence intervals for given percentages. (And the same for standard errors/deviations.)

Are there any alternative solutions?

No easy alternative comes to mind.

kaimikael avatar Dec 06 '21 00:12 kaimikael

We have discussed this at some length at today's meeting.

The Scatter Plot widget is already quite heavy. We hesitate adding more unless it really belongs there.

The problem is -- it doesn't. Scatter Plot basically shows two - in principle - independent variables. The line through the group could also be vertical. Currently, the widget has a checkbox whether to treat the variables as independent or not, and computes the "regression" line accordingly. Default is to treat y as dependent, which is, imho, wrong, but it accomodates the user's (inappropriate) expectation.

Besides, it doesn't stop with confidence intervals. There's a bunch of other things one might want to show. But this would turn the Scatter Plot widget into Linear Regression widget.

Our decision was thus to not implement this in scatter plot. There is alsready a widget called Linear Regression which could have a plot with all bells and whistles one can imagine, and which could also output all residuals and whatever other stuff.

P.S. In the spirit of what the scatter plot is, I would much prefer removing the lines and showing contours of 2d Gaussian (hm?) distribution and principle components. This would be particularly nice when there are multiple (colored) groups.

janezd avatar Dec 10 '21 12:12 janezd

Instead of closing this, I would keep it and convert it to feature request: Linear Regression visualization. Or something.

ajdapretnar avatar Dec 10 '21 12:12 ajdapretnar

You're right. I opened new issue, #5733, though, and referred to this one.

janezd avatar Dec 10 '21 14:12 janezd

Perhaps we could use bootstrap to show the uncertainty in the estimation of the regression lines. And because it is a visualization, it fits this widget perfectly. Like this:

bootstrap

This was an example in Štrumbelj's bootstrap lecture.

markotoplak avatar Dec 16 '21 13:12 markotoplak

I think the computation is not much of a problem here. The question is whether to add even more to this widget. Next month we add non-linear regression curve, followed in February by a new method of variable selection (VizRank is more appropriate for independent variables, while for dependent we should sort variables by correlation coefficient) ...

janezd avatar Dec 16 '21 13:12 janezd

I wonder, could it be done so that there are specialised widgets that do one thing each and the output of these can be combined in one meta-visualiser?

kaimikael avatar Dec 16 '21 13:12 kaimikael

We could add remove a feature to add one. I'd remove not-treating variables as independent (which does not fit into scatterplot) and add bootstrapped lines.

markotoplak avatar Dec 16 '21 13:12 markotoplak

Thinking more about it, wouldn’t a Combine Plot widget be a good thing? That would allow e g combining a Line Plot with a Bar Plot, which is a not entirely uncommon combination:

image

kaimikael avatar Dec 16 '21 14:12 kaimikael

Treat variables should be indented.

janezd avatar Dec 17 '21 08:12 janezd

This should be solved within #5733, so I am closing this as separate issue.

janezd avatar Mar 24 '22 20:03 janezd

@borondics just suggested he'd like something like the above bootstrapped version in a scatterplot.

markotoplak avatar Apr 13 '23 13:04 markotoplak