dowhy p-value for estimator vs p-value for refutation of estimator with bootstrap

Hello,

Can you explain me what is the difference between the two tests? They looks too similar to me and they are give me the same results when I run locally the functions giving them hardcoded the estimate_value (of course in the causal estimator's case I run the code from # Processing the null hypothesis estimates and below ).

https://py-why.github.io/dowhy/v0.8/_modules/dowhy/causal_estimator.html#CausalEstimator

    def _test_significance_with_bootstrap(self, estimate_value, num_null_simulations=None):
        """ Test statistical significance of an estimate using the bootstrap method.

        :param estimate_value: Obtained estimate's value
        :param num_null_simulations: Number of simulations for the null hypothesis
        :returns: p-value of the statistical significance test.
        """
        # Use existing params, if new user defined params are not present
        if num_null_simulations is None:
            num_null_simulations = self.num_null_simulations
        do_retest = self._bootstrap_null_estimates is None or CausalEstimator.is_bootstrap_parameter_changed(
            self._bootstrap_null_estimates.params, locals())
        if do_retest:
            null_estimates = np.zeros(num_null_simulations)
            for i in range(num_null_simulations):
                new_outcome = np.random.permutation(self._outcome)
                new_data = self._data.assign(dummy_outcome=new_outcome)
                # self._outcome = self._data["dummy_outcome"]
                new_estimator = type(self)(
                    new_data,
                    self._target_estimand,
                    self._target_estimand.treatment_variable,
                    ("dummy_outcome",),
                    test_significance=False,
                    evaluate_effect_strength=False,
                    confidence_intervals=False,
                    target_units=self._target_units,
                    effect_modifiers=self._effect_modifier_names,
                    **self.method_params
                )
                new_effect = new_estimator.estimate_effect()
                null_estimates[i] = new_effect.value
            self._bootstrap_null_estimates = CausalEstimator.BootstrapEstimates(
                null_estimates,
                {'num_null_simulations': num_null_simulations, 'sample_size_fraction': 1})

        # Processing the null hypothesis estimates
        sorted_null_estimates = np.sort(self._bootstrap_null_estimates.estimates)
        self.logger.debug("Null estimates: {0}".format(sorted_null_estimates))
        median_estimate = sorted_null_estimates[int(num_null_simulations / 2)]
        # Doing a two-sided test
        if estimate_value > median_estimate:
            # Being conservative with the p-value reported
            estimate_index = np.searchsorted(sorted_null_estimates, estimate_value, side="left")
            p_value = 1 - (estimate_index / num_null_simulations)
        if estimate_value <= median_estimate:
            # Being conservative with the p-value reported
            estimate_index = np.searchsorted(sorted_null_estimates, estimate_value, side="right")
            p_value = (estimate_index / num_null_simulations)
        # If the estimate_index is 0, it depends on the number of simulations
        if p_value == 0:
            p_value = (0, 1 / len(sorted_null_estimates))  # a tuple determining the range.
        elif p_value == 1:
            p_value = (1 - 1 / len(sorted_null_estimates), 1)
        signif_dict = {
            'p_value': p_value
        }
        return signif_dict

https://py-why.github.io/dowhy/v0.8/_modules/dowhy/causal_refuter.html#CausalRefuter.test_significance

    def perform_bootstrap_test(self, estimate, simulations):

        # Get the number of simulations
        num_simulations = len(simulations)
        # Sort the simulations
        simulations.sort()
        # Obtain the median value
        median_refute_values = simulations[int(num_simulations/2)]

        # Performing a two sided test
        if estimate.value > median_refute_values:
            # np.searchsorted tells us the index if it were a part of the array
            # We select side to be left as we want to find the first value that matches
            estimate_index = np.searchsorted(simulations, estimate.value, side="left")
            # We subtact 1 as we are finding the value from the right tail
            p_value = 1 - (estimate_index/ num_simulations)
        else:
            # We take the side to be right as we want to find the last index that matches
            estimate_index = np.searchsorted(simulations, estimate.value, side="right")
            # We get the probability with respect to the left tail.
            p_value = estimate_index / num_simulations
        # return twice the determined quantile as this is a two sided test
        return 2*p_value

Aug 11 '22 21:08 itsoum

They are the same :) Sometimes, it is useful to differentiate them. For example, some estimators (E.g., linear regression estimator) may use parametric confidence intervals which is fast, and then we may want to refute the analysis using a bootstrap p-value which makes no parametric assumptions.

But yeah, if you are already using the bootstrap method for testing significance, then the refutation by bootstrap is redundant.

Aug 29 '22 04:08 amit-sharma

Closing this as the question seems to be answered. Please re-open if not. Thanks.

Oct 14 '22 13:10 petergtz

dowhy dowhy copied to clipboard

p-value for estimator vs p-value for refutation of estimator with bootstrap

dowhy
dowhy copied to clipboard