docsearch icon indicating copy to clipboard operation
docsearch copied to clipboard

Unable to crawl the new release of the website

Open Prarthanav opened this issue 2 years ago • 2 comments
trafficstars

Description

We had implemented docSearch for our open source website hyperswitch.io/docs, the indexing was done according to the default settings initially, we recently did a deployment for our website with new UI and a few more information. We customised the recordProps as well. When we tried to re-start the crawler, none of the records were being indexed. The error: startUrl is being ignored. We are unable to crawl the new version of the website and we're unable to debug why the issue is coming up in the first place.

Environment

Screenshot 2023-04-17 at 12 59 55 PM
  • OS: [e.g. Windows / Linux / macOS / iOS / Android]
  • Browser: [e.g. Chrome, Safari]
  • DocSearch version: [e.g. 3.0.0]

Prarthanav avatar Apr 17 '23 07:04 Prarthanav

Hi there! We received your support request through email and will get back to you soon.

shaneafsar avatar Apr 17 '23 18:04 shaneafsar

Hi, your website is blocking the requests coming from some specific IPs. Here are the results when trying to access it with an OVH IP for example:

$ curl -I https://hyperswitch.io/docs/
HTTP/2 404 
content-type: text/html
content-length: 13951
date: Tue, 25 Apr 2023 08:11:06 GMT
last-modified: Fri, 21 Apr 2023 10:06:18 GMT
etag: "793d1556f4ef67145603870aefb1fca7"
x-amz-server-side-encryption: AES256
x-amz-meta-deployment-id: 2023-04-21-8ff5ce9f2b840649f248d405ab4157ff8cc9515e
accept-ranges: bytes
server: AmazonS3
vary: Accept-Encoding
x-cache: Error from cloudfront
via: 1.1 c2015c52d38ccde0fdca03737208f710.cloudfront.net (CloudFront)
x-amz-cf-pop: MXP64-C1
x-amz-cf-id: ICL4hwz3ORqKjVEwh4x6I9eM-dgG-LMDkJnqwhNKO3WexI6EWZkK5g==

image

You should allow the crawler IP to access your website.

sbellone avatar Apr 25 '23 08:04 sbellone

Closing this issue as the website in question no longer exists.

randombeeper avatar Jul 10 '24 22:07 randombeeper

It's been moved to a subdomain but it still exists:

$ curl https://hyperswitch.io/docs/ -I
HTTP/2 301 
server: CloudFront
date: Thu, 11 Jul 2024 07:20:58 GMT
content-length: 0
location: https://docs.hyperswitch.io/

But good to close nevertheless, since there never was any followup.

sbellone avatar Jul 11 '24 07:07 sbellone