colima icon indicating copy to clipboard operation
colima copied to clipboard

MX lookups not making it out of Colima

Open hazardv opened this issue 1 year ago • 1 comments

Description

I am working on a project involving an app server and a postfix mail server that is used by the app server to send mail. I have everything set up for development in Colima with Docker. When attempting to send a test message from the postfix server to mailinator.com postfix uses the IP address from one of the A name records for mailinator.com instead of one of the MX records.

I am using this basic postfix docker image but have gotten the same results with a couple of different postfix images (https://hub.docker.com/r/boky/postfix)

I did some testing running tcpdump on my mac.

This first screenshots the output from tcpdump when running the postfix docker image in Colima. Screen Shot 2022-09-12 at 11 12 18 AM

This screenshot shows the output from tcpdump when running the postfix docker image on my mac without Colima running. Screen Shot 2022-09-12 at 11 17 29 AM

As you can see, when not running in Colima postfix is doing MX lookups and finding the IP addresses for the mailinator mail servers. When running inside Colima something is either preventing the MX lookups from happening or they are happening but not making it out of Colima and then postfix does an A name lookup because the MX lookup came back not found.

I have also tried sending mail to other mail providers (gmail and yahoo) with the same results.

Version

Colima Version: 0.4.4 Lima Version: 0.11.3 Qemu Version: 7.0.0

Operating System

  • [ ] macOS Intel
  • [X] macOS M1
  • [ ] Linux

Reproduction Steps

  1. Start Colima colima start --arch x86_64 --memory 4
  2. Start Postfix container docker run --rm --name postfix -e "ALLOWED_SENDER_DOMAINS=example.com" -p 1587:587 boky/postfix
  3. Enter postfix container docker exec -it postfix bash
  4. Send test message
sendmail [email protected]
From: [email protected]
To: [email protected]
Test Message 1
.

Expected behaviour

Postfix log output would show connect to mail.mailinator.com[23.239.11.30]:25 connect to mail2.mailinator.com[45.33.83.75]:25

and output from tcpdump running on the host would look like

Screen Shot 2022-09-12 at 11 17 29 AM

Additional context

No response

hazardv avatar Sep 12 '22 15:09 hazardv

I found that setting the nameserver to 8.8.8.8 by either running colima start --dns 8.8.8.8 or setting it in the template makes MX lookups work properly.

hazardv avatar Sep 14 '22 16:09 hazardv

@hazardv if you would not mind assisting with troubleshooting, can you try the following?

  • Install the development version brew install --HEAD colima, you can revert afterwards.
  • Use the slirp network driver, colima start --network-driver slirp.

Does this change anything in the behaviour?

Note: you can use a separate profile for experiments to leave your current workload intact. e.g. colima start test will start with a separate profile named test.

abiosoft avatar Sep 18 '22 08:09 abiosoft

@abiosoft I did the following:

  • Install the dev version brew unlink colima then brew install --HEAD colima
    • Installed version: colima version HEAD-b90b53e git commit: b90b53e8a9a0892001f11026368902181abe8fc6
  • Start colima with slirp network driver colima start test --network-driver slirp --arch x86_64 --memory 4
  • Start container docker run --rm --name postfix -e "ALLOWED_SENDER_DOMAINS=example.com" -p 1587:587 boky/postfix
  • Enter container docker exec -it postfix bash
  • Send test message using commands in reproduction steps

Things appear to be working correctly now. Screen Shot 2022-09-19 at 9 56 27 AM

I did update my default template to use 8.8.8.8 as the DNS server. I am wondering if that had any effect on this since I am seeing 8.8.8.8 as the DNS server in the tcpdump output (could just be a coincidence). I am going to remove that from my default template and try again with a new profile just to be safe.

hazardv avatar Sep 19 '22 14:09 hazardv

After removing 8.8.8.8 from my default template and starting a new colima instance colima start test2 --network-driver slirp --arch x86_64 --memory 4 the tcpdump shows the MX lookups making it out of the container but pointed at my local network DNS as you would expect.

hazardv avatar Sep 19 '22 14:09 hazardv

Thanks for the update.

Slirp was actually the default on Colima as well as default user-mode network on Qemu. However, it was erratic with DNS resolution on macOS, and that lead to the switch to gvproxy.

There has been less complaints from users about network performance since the switch to gvproxy. Nonetheless, switching back to slirp as the default is being evaluated.

@hazardv is it fine to close the issue? The ability to toggle between both network drivers will be part of the next release.

abiosoft avatar Sep 19 '22 15:09 abiosoft

Thank you for looking into this.

hazardv avatar Sep 22 '22 16:09 hazardv

We just got a bug fix for the internal DNS server for Lima: https://github.com/lima-vm/lima/pull/1079

The bug is that the DNS server would never set the truncate flag on UDP responses, even if they did not include all the answers. So the client would never know to re-query using TCP to get the full answer.

It is possible that this bug is also responsible for the errors you have seen.

It would be great if you could re-test your issue when the next version of Lima is released, to let us know if it fixes the MX record getting lost as well!

jandubois avatar Oct 03 '22 19:10 jandubois

Is https://github.com/lima-vm/lima/pull/1079 likely to fix the persistent problems with lots of nameservers (that cause everybody to use --dns 1.1.1.1 ?

rfay avatar Oct 03 '22 19:10 rfay

Is lima-vm/lima#1079 likely to fix the persistent problems with lots of nameservers (that cause everybody to use --dns 1.1.1.1 ?

I don't know. I would hope so, but my guts feeling says it probably won't.

If you have a reproducible test case, please file a bug against Lima. I've never seen a test case that I could reproduce myself, which makes it hard to investigate.

jandubois avatar Oct 03 '22 21:10 jandubois

It seems to be something about DNS servers in local routers. There have been so many cases that we just tell people to install colima that way, and it seems to fix everything. There was one particular problem that did have to do with accessing google's apis, that one might be fixed. They were able to look up everything but not the one thing in google's storageapi.

rfay avatar Oct 03 '22 22:10 rfay

It seems to be something about DNS servers in local routers.

Unfortunately that doesn't narrow it down at all. Anything these routers have in common?

we just tell people to install colima that way, and it seems to fix everything.

That sounds good, but I suspect it doesn't help when you need e.g. split-DNS over VPN, or want to resolve local names from mDNS. I guess it is not a requirement for most.

jandubois avatar Oct 04 '22 06:10 jandubois

@jandubois IDK if it helps at all but my local network is an Eero mesh setup using a first-gen Eero Pro. Also, have you tried following the steps in this issue to see if you can replicate it on your system?

hazardv avatar Oct 04 '22 13:10 hazardv