onyx
onyx copied to clipboard
Web connector - documents deleted when no internet connection
Hello there,
I had active web connector for recursive scraping of web site and knowledge base ready. However update started when I had no internet connection and all documents are deleted.
3/2/2024, 11:50:59 AM | Succeeded
New Doc Cnt: 0
(also removed 976 docs that were detected as deleted in the source)
Total Doc Cnt: 0
Hi,
While the recent fix (PR #1214) attempts to address the issue by checking for a TCP connection to Google DNS servers on port 53, it has unfortunately broken our deployment in certain environments.
The current implementation relies on direct access to external IP addresses and ports, which can be problematic in several scenarios:
-
Restricted environments: In closed or controlled environments, internet access might be available but restricted through an HTTP proxy on a specific port. The current check wouldn't work in such cases as it bypasses the proxy configuration. -
Local network deployments: Some deployments might not have any outgoing internet connection and rely solely on local network resources. Checking for external DNS servers would incorrectly flag these deployments as offline.
Instead of relying on specific IP addresses and ports, I propose that we modify the check_internet_connection function to use a reachable URL. This approach would be more flexible and accommodate various network configurations, including those with proxy servers or limited internet access.
Here are some potential solutions:
-
Use a dedicated internet connectivity checking service: Several online services like "https://www.google.com/generate_204" can be used to verify internet access. We can make a simple HTTP request to such a service and consider the connection successful if we receive a valid response. -
Check for a known internal resource: If applicable, we can attempt to connect to a known resource within the local network. This would ensure that the network is accessible without relying on external connections.
Thomas