great_expectations
great_expectations copied to clipboard
Disabling GE_USAGE_STATS Doesn't Prevent Internet Connection Attempts in Databricks
Describe the bug I am experiencing an issue where setting the environment variable GE_USAGE_STATS to "false" does not prevent Great Expectations from attempting to connect to the internet. Specifically, the library is trying to reach posthog.greatexpectations.io to send logging and statistics information when i'm doing a values not null test.
Expected behavior Setting os.environ["GE_USAGE_STATS"] = "false" should completely disable any attempts to connect to external servers for usage statistics, especially when operating in isolated network environments.
Environment (please complete the following information): In my setup, I'm using Great Expectations within Databricks in a private network scenario. Hence, I do not have internet access. Despite this, I encounter numerous connection timeout attempts to posthog.greatexpectations.io. These repeated connection attempts result in: Unnecessary computation overhead due to retries and timeouts. Log clutter with repeated timeout warnings
- Great Expectations Version: 1.0.6
- Platform: Databricks
- Network: Private network without internet access
Additional context Please investigate and resolve this issue to ensure that setting GE_USAGE_STATS to "false" effectively disables any external connections, aligning with strict network access policies. I've seen that there was a similar issue mentioned before, but it seems like it isn't resolved yet. More info can be found here: https://github.com/great-expectations/great_expectations/issues/9966
Thank you!