orange3
orange3 copied to clipboard
Confidence intervals for regression lines
What's your use case?
The Scatter Plot widget lets the user add a regression line to the scatter plot. However, there is no indication of the uncertainty of the displayed regression. I would like to be able to display x% confidence intervals.
What's your proposed solution?
A checkbox and number box to add confidence intervals for given percentages. (And the same for standard errors/deviations.)
Are there any alternative solutions?
No easy alternative comes to mind.
We have discussed this at some length at today's meeting.
The Scatter Plot widget is already quite heavy. We hesitate adding more unless it really belongs there.
The problem is -- it doesn't. Scatter Plot basically shows two - in principle - independent variables. The line through the group could also be vertical. Currently, the widget has a checkbox whether to treat the variables as independent or not, and computes the "regression" line accordingly. Default is to treat y as dependent, which is, imho, wrong, but it accomodates the user's (inappropriate) expectation.
Besides, it doesn't stop with confidence intervals. There's a bunch of other things one might want to show. But this would turn the Scatter Plot widget into Linear Regression widget.
Our decision was thus to not implement this in scatter plot. There is alsready a widget called Linear Regression which could have a plot with all bells and whistles one can imagine, and which could also output all residuals and whatever other stuff.
P.S. In the spirit of what the scatter plot is, I would much prefer removing the lines and showing contours of 2d Gaussian (hm?) distribution and principle components. This would be particularly nice when there are multiple (colored) groups.
Instead of closing this, I would keep it and convert it to feature request: Linear Regression visualization. Or something.
You're right. I opened new issue, #5733, though, and referred to this one.
Perhaps we could use bootstrap to show the uncertainty in the estimation of the regression lines. And because it is a visualization, it fits this widget perfectly. Like this:
This was an example in Štrumbelj's bootstrap lecture.
I think the computation is not much of a problem here. The question is whether to add even more to this widget. Next month we add non-linear regression curve, followed in February by a new method of variable selection (VizRank is more appropriate for independent variables, while for dependent we should sort variables by correlation coefficient) ...
I wonder, could it be done so that there are specialised widgets that do one thing each and the output of these can be combined in one meta-visualiser?
We could add remove a feature to add one. I'd remove not-treating variables as independent (which does not fit into scatterplot) and add bootstrapped lines.
Thinking more about it, wouldn’t a Combine Plot widget be a good thing? That would allow e g combining a Line Plot with a Bar Plot, which is a not entirely uncommon combination:
Treat variables should be indented.
This should be solved within #5733, so I am closing this as separate issue.
@borondics just suggested he'd like something like the above bootstrapped version in a scatterplot.