weave icon indicating copy to clipboard operation
weave copied to clipboard

dns-remove fails if container is not running

Open paulbellamy opened this issue 9 years ago • 14 comments

paulbellamy avatar Feb 16 '16 13:02 paulbellamy

This is not a problem anybody should ever run into. If weave fails to remove entries for dead containers, we have a problem of much greater magnitude.

rade avatar Feb 16 '16 13:02 rade

But there's also no reason for a remove operation to not be idempotent. Either it's there and it removes the entry, or it isn't there and there's nothing to do.

paulbellamy avatar Feb 16 '16 13:02 paulbellamy

But there's also no reason for a remove operation to not be idempotent.

Sure. I didn't close this issue ;)

Are all the other remove variants idempotent?

rade avatar Feb 16 '16 13:02 rade

Noting that this came up in the context that WeaveDNS did hold (through some other issue) entries for a dead container, so it was frustrating that dns-remove refused to remove them.

bboreham avatar Feb 19 '16 11:02 bboreham

We do not know whether the container in question was gone completely or just not running.

Are we saying that weave dns-remove [<ip_address> ...] <container_id> [-h <fqdn>] should remove all DNS records pertaining to <container_id> when <container_id> isn't running or doesn't exist, regardless of the other options specified? Or should it only remove the records matching the args, even though no DNS records should be existing at all for a non-running/gone container?

And what should we do if the user specifies a container name but that name does not exist? Should we do nothing and a) succeed, or b) fail? What if it's a partial container id? In theory we could attempt to weed out DNS entries matching that.

Also, should the same logic apply to weave detach?

IMO this is an insane amount of fiddly work to put into coping with a situation that should not exist.

rade avatar Feb 19 '16 11:02 rade

strawman:

  • it should only remove the records matching the args, even though no DNS records should be existing at all for a non-running/gone container
  • dns records should be matched by container id, short-id, or name.

re: weave detach probably, yes.

"should not" != "does not"

paulbellamy avatar Feb 19 '16 13:02 paulbellamy

dns records should be matched by container id, short-id, or name.

We don't record container names in DNS.

rade avatar Feb 19 '16 14:02 rade

"should not" != "does not"

I still think this is complete overkill. Don't forget that the user already has a way to clear any garbage: bounce the router. That will not only address this very particular problem but a range of other ills.

rade avatar Feb 19 '16 14:02 rade

I got the same problem with my dns entries

dtk-0        192.166.0.1     64063d7ed392 3a:ca:39:55:88:70
es-data-1    192.164.179.54  cb54a9a230f8 1a:ef:98:09:64:c5
es-data-1    192.172.51.54   d7fc0e5da735 2e:6a:39:aa:a5:11
es-data-2    192.166.51.53   77d2ee8d4882 da:1f:2b:60:11:02
es-data-3    192.172.51.57   c39a3b3cba68 2e:6a:39:aa:a5:11
es-data-4    192.171.153.156 4a454e3348c9 7a:36:bb:b9:17:23
jupyter      192.171.179.58  b68fcdfed791 46:06:c5:de:a7:46
kafka-1      192.171.179.56  d2c2dda6494d 46:06:c5:de:a7:46
kafka-1      192.171.179.57  d2c2dda6494d 46:06:c5:de:a7:46
kafka-2      192.172.76.208  52a3bfe2e71b 7a:81:96:fb:3e:a1
kafka-2      192.172.76.209  52a3bfe2e71b 7a:81:96:fb:3e:a1
kafka-3      192.166.25.157  91b39285489d de:58:9b:22:2d:b4
kafka-3      192.166.25.158  91b39285489d de:58:9b:22:2d:b4
nosql-0      192.166.0.2     78580407e182 3a:ca:39:55:88:70
redis        192.171.153.157 4fb09654acea 7a:36:bb:b9:17:23
spark-worker-6 192.172.51.58   ca35af387291 2e:6a:39:aa:a5:11
traverser    192.171.153.158 1e00ca4189f4 7a:36:bb:b9:17:23
trifecta     192.166.0.3     c92069d9fcb5 3a:ca:39:55:88:70
zeppelin     192.164.179.56  564eb27826b2 1a:ef:98:09:64:c5
zoo-1        192.171.179.55  9ceb98df24e2 46:06:c5:de:a7:46
zoo-2        192.172.76.207  2e6b1b51ccac 7a:81:96:fb:3e:a1
zoo-3        192.165.230.104 993873d447c8 e6:1c:c0:92:67:5e

You can see that there are several containers with replicate names that do not exists. It would be useful to remove DNS entries without checking if the container is running. Probably it is possible to add a -f parameter to the command. If you give to me a pointer to the class to check I will do a pull request.

fabiofumarola avatar Mar 06 '16 20:03 fabiofumarola

Hi @fabiofumarola, it would be even better if we can get to the bottom of why you have stale entries and fix that.

Could you please run:

docker kill -s QUIT weave docker logs weave

and post the result at https://github.com/weaveworks/weave/issues/2023

bboreham avatar Mar 06 '16 21:03 bboreham

We think the problem that gave rise to this was fixed in 1.4.6 (via #2023).

I suggest this issue should make weave dns-remove idempotent in the same way as we made weave rmpeer idempotent, i.e. it should report on what it did. I suggest listing the name, IP and container, in the same way as weave status dns does (no need to list the peer since that is always our peer, since weave dns-remove only ever acts on our records).

I don't think we should be attempting to work around DNS breakages here.

rade avatar Apr 21 '16 18:04 rade

Man, its suuuuuper disappointing that weave dns-remove does not do what it says it does. Please rename to weave remove-running-container-record so that people dont expect this tool to do what the function currently labeled as being able to do.

I cant believe Im about to have to flush routing for hundreds of services when all that needs to happen is the removal of a single DNS record.

This is the kind of pain that's made weave an unacceptable service discovery mechanism for our business needs.

RoryKiefer avatar Apr 29 '20 16:04 RoryKiefer

Would you like to purchase business support?

bboreham avatar Apr 29 '20 16:04 bboreham

Actually, yeah I kinda do. I'll reach out.

RoryKiefer avatar Feb 23 '21 16:02 RoryKiefer