"operation was canceled" during DNS lookups on AWS
Expected behavior
No error.
Actual behavior
Errors like: Error: Post http://node0.sandbox.pilosa.com:10101/index/myindex/query: dial tcp: lookup node0.sandbox.pilosa.com on 10.0.0.2:53: dial udp 10.0.0.2:53: operation was canceled
Steps to reproduce the behavior
6 node r4.xlarge cluster on AWS with this AMI ami-6e29e714. Spun up using our infrastructure tooling.
Separate r4.xlarge agent node - same AMI.
Use pi command from tools repo to spawn a random set bits benchmark.
pi spawn --agent-hosts="agent0.pilosa.com" --goos=linux --output=s3 --pilosa-hosts="host1:10101,host2:10101..." --filename=randomsetbits.json --ssh-user=ubuntu
hi, have you resolved the issue or found the cause of such behavior on AWS? we're experiencing the same problems during loadtests
@kgruszka unfortunately, this was never totally resolved. It's really only an issue if you have to make LOTS of small queries, and for data ingestion, that can be mitigated by using the /import endpoint. It's also possible to group multiple queries in the same HTTP request which would help to mitigate this issue as well.
I think you can also use IP addresses instead of hostnames in your cluster configuration and work around it that way.
My memory is a bit hazy, but this may be related to whether the pure Go or CGo DNS resolver is used, which can be affected by whether Pilosa is cross compiled for Linux from another platform, or built natively - there are some debug environment variables you can set that tell the runtime to print out information about what resolver is used.
Make sure that you are using an official release binary, or building Pilosa with the latest version of Go (1.10). If none of the above workarounds are sufficient, and you're still experiencing difficulty with this, we will prioritize it and dive back in.
@jaffee We're experiencing this issue in a project not related to pilosa and in our case the problem is that the ips of the ALB are not being cached and each request to the services behind it requires dns resolution. Here is the documented limitation, I hope that it will be useful for you too: https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html#vpc-dns-limits
That is very helpful, thank you for taking the time to post it - good luck with your project!
https://github.com/golang/go/issues/22724
You might try putting a . at the end of your DNS entry. i.e., some.hostname.com.