featurebase icon indicating copy to clipboard operation
featurebase copied to clipboard

"operation was canceled" during DNS lookups on AWS

Open jaffee opened this issue 8 years ago • 6 comments

Expected behavior

No error.

Actual behavior

Errors like: Error: Post http://node0.sandbox.pilosa.com:10101/index/myindex/query: dial tcp: lookup node0.sandbox.pilosa.com on 10.0.0.2:53: dial udp 10.0.0.2:53: operation was canceled

Steps to reproduce the behavior

6 node r4.xlarge cluster on AWS with this AMI ami-6e29e714. Spun up using our infrastructure tooling.

Separate r4.xlarge agent node - same AMI.

Use pi command from tools repo to spawn a random set bits benchmark.

pi spawn --agent-hosts="agent0.pilosa.com" --goos=linux --output=s3 --pilosa-hosts="host1:10101,host2:10101..." --filename=randomsetbits.json --ssh-user=ubuntu

jaffee avatar Oct 30 '17 18:10 jaffee

hi, have you resolved the issue or found the cause of such behavior on AWS? we're experiencing the same problems during loadtests

kgruszka avatar Apr 24 '18 15:04 kgruszka

@kgruszka unfortunately, this was never totally resolved. It's really only an issue if you have to make LOTS of small queries, and for data ingestion, that can be mitigated by using the /import endpoint. It's also possible to group multiple queries in the same HTTP request which would help to mitigate this issue as well.

I think you can also use IP addresses instead of hostnames in your cluster configuration and work around it that way.

My memory is a bit hazy, but this may be related to whether the pure Go or CGo DNS resolver is used, which can be affected by whether Pilosa is cross compiled for Linux from another platform, or built natively - there are some debug environment variables you can set that tell the runtime to print out information about what resolver is used.

Make sure that you are using an official release binary, or building Pilosa with the latest version of Go (1.10). If none of the above workarounds are sufficient, and you're still experiencing difficulty with this, we will prioritize it and dive back in.

jaffee avatar Apr 24 '18 16:04 jaffee

@jaffee We're experiencing this issue in a project not related to pilosa and in our case the problem is that the ips of the ALB are not being cached and each request to the services behind it requires dns resolution. Here is the documented limitation, I hope that it will be useful for you too: https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html#vpc-dns-limits

kgruszka avatar Apr 26 '18 13:04 kgruszka

That is very helpful, thank you for taking the time to post it - good luck with your project!

jaffee avatar Apr 26 '18 13:04 jaffee

https://github.com/golang/go/issues/22724

s7v7nislands avatar Jan 14 '21 03:01 s7v7nislands

You might try putting a . at the end of your DNS entry. i.e., some.hostname.com.

jeffreydwalter avatar Dec 13 '21 19:12 jeffreydwalter