anax icon indicating copy to clipboard operation
anax copied to clipboard

Agent does not cleanup docker networks in some situations.

Open TheMosquito opened this issue 4 years ago • 2 comments

These errors:

2020-03-25 17:26:33: Error starting containers: API error (404): could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network
2020-03-25 17:27:02: Error starting containers: API error (404): could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

Are seen when Docker has already created the maximum number of docker virtual networks, and is trying to create another. It fails with the above message.

Although the use had already run docker network prune so i could not get the docker network inspect info on the networks, they did have the docker network prune output which lists the deleted networks:

pierrefeillet@MacBook-Pro-108:~/Projects/Edge/examples/edge/services/helloworld$ docker network prune
WARNING! This will remove all networks not used by at least one container.
Are you sure you want to continue? [y/N] y
Deleted Networks:
2763bcc53dc4cb036cc56c3bc1ed135b2b3dad022bbf35a8ddd20c13000e8486
eba8dd900d2005cf848d9efefc2fcffc89a0011b32ed6272409ef9043acfdb0f
ad057867959b091e72a19b97ffc42a9c1df3f24a1cebea854e1ff091cf99b8e3
97548aa0effb9b1f5e0c0f13ae99ae21e487c1343e65aed062a9bd196ff17c2f
056b7d1744788722b17f0518e70f3001d9e8b49bf5e63096f7bae6fe46516dad
811ee3a957d56c25d49fcc06d8086cad18d620d2f72239b1876faa7c623ad204
d2b13f321f8f23957a069488f56728216ba0da7ee5232f413374d5932c39869a
26fd92d17d131a6e723e3227ed9ddebbfd2c913ee58a251f67400979debbb2b1
3b1d8f8f48c0074c97e23f0a3783bb49b003d801839b4a4c4a43b4b6707566dc
28f473ea659c872be350d53b27997dbe0ed1eeaecac05b5ab1ba15ff8a3aaf63
ea9abec2b61a8ff79c4a2cb6d2ae98d5c65527e639c5c8bb8d844b1a0ae40bd1
1b92eda122617c0cf25a8928c92d8f4d5edef9910c19811b03884d1414e50e84
d0df7fff813491b873461ce40dfffbc2c83de76f8fd26bd57734eae89703d81c
92e670b7ce3dff3afadac8aa398b955d3cc0e5e92dda59c669bf9d9b2563c858
059156072462131abcdae9f7eebd9522bcfcba3042c8dfe0bf3fca9687300ace
a224bf858d1f9ec462064a553af04c8da74a9d48e502119c827b5d0f0a2a8dab
2db6aed83cc437ce22c8e744212c035c050bd03b04c53e0ac9ec4f887e9207bf
3fc7ee14fa9da3f9fefe52a466792ffcc04f244b375ff7842b49dfc3a423657d
1388b0b8997332fc7fe416bf847ff942fd46b3194d23d49f41b2fbe52a90b1d7
65ef3c1673db1b5b5d2598ba03e64c6c0d287dc8f54c73274e53abe534cdf4e0
13f39cab06e112710a616b6d8cb2f1624309eff530c0cbc20a0e4336b75f9e3c
dfbb56e6a7c32fd6442a3971d06cd87630748d88c88209d28c89caed3716e1bf
45d648cedf469ed13859bb5f2aee9426fb4dcc44c9cd7f222b6e795d0b985e57
c964d676cc34363fd52b90755a8f5016aadb65761f08b7d3c46cd4964f84201a
a54e16c15a3e2640ce1623eea7a1ec07fd35c07e5ffebe5f3e55ddb469a915ad
226289cb08c241366bf9634496efd5fe74502c8b28de87353cfaaf154fd86f24
ad4e9aa175a428ff720dc51a987c100e1f47a7375235680411928af4c99d37e5
a14a95b0a5b034e75f84ddc6aba32af057349b7704ac89d95ea5f5c35fc9ee1b
97c46ae55c73bdbab0ec712e6061ccd64cba90dbe0cf84d7fba5bfe6bb5a2c5c

So I asked him to provide the hzn agreement list -r output, which is below. As you can see, many of the above deleted network names correspond to Horizon agreement IDs, so I think it is certain that the agent has failed to cleanup after itself on this Mac. For example, the first deleted network is 2763bcc53dc4cb036cc56c3bc1ed135b2b3dad022bbf35a8ddd20c13000e8486 and here is the corresponding cancelled agreement with that agreement ID:

  {
    "name": "pattern-ibm.helloworld_ibm.helloworld_IBM_amd64 merged with pattern-ibm.helloworld_ibm.helloworld_IBM_amd64",
    "current_agreement_id": "2763bcc53dc4cb036cc56c3bc1ed135b2b3dad022bbf35a8ddd20c13000e8486",
    "consumer_id": "IBM/mycluster-agbot",
    "agreement_creation_time": "2020-03-25 08:25:02 +0100 CET",
    "agreement_accepted_time": "2020-03-25 08:25:09 +0100 CET",
    "agreement_finalized_time": "2020-03-25 08:25:23 +0100 CET",
    "agreement_execution_start_time": "",
    "agreement_data_received_time": "",
    "agreement_protocol": "Basic",
    "workload_to_run": {
      "url": "ibm.helloworld",
      "org": "IBM",
      "version": "1.0.0",
      "arch": "amd64"
    },
    "agreement_terminated_time": "2020-03-25 08:26:29 +0100 CET",
    "terminated_reason": 103,
    "terminated_description": "service terminated"
  },

TheMosquito avatar Mar 27 '20 17:03 TheMosquito

The full output is too large for an issue. I'll send it to you in Slack.

TheMosquito avatar Mar 27 '20 18:03 TheMosquito

This issue could be related to https://github.com/open-horizon/anax/issues/1622 because this was the same user, same machine, same task, and the permission errors identified in that issue may have been the cause of the service terminations that led to these networks not being cleaned up. (Ling asked me to add this).

TheMosquito avatar Mar 27 '20 18:03 TheMosquito