Go IPFS Docker uses 127.0.0.1 instead of 0.0.0.0

Open yznima opened this issue 3 years ago • 1 comments

Checklist

[X] This is a bug report, not a question. Ask questions on discuss.ipfs.io.
[X] I have searched on the issue tracker for my bug.
[X] I am running the latest go-ipfs version or have an issue updating.

Installation method

built from source

Version

go-ipfs version: 0.12.2-0e8b121
Repo version: 12
System version: amd64/linux
Golang version: go1.16.15

Config

{
  "Identity": {
    "PeerID": "XXXX",
    "PrivKey": "XXXXX" },
  "Datastore": {
    "StorageMax": "10GB",
    "StorageGCWatermark": 90,
    "GCPeriod": "1h",
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "HashOnRead": false,
    "BloomFilterSize": 0
  },
  "Addresses": {
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ],
    "Announce": [],
    "AppendAnnounce": [],
    "NoAnnounce": [],
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Gateway": "/ip4/127.0.0.1/tcp/8080"
  },
  "Mounts": {
    "IPFS": "/ipfs",
    "IPNS": "/ipns",
    "FuseAllowOther": false
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Routing": {
    "Type": "dht"
  },
  "Ipns": {
    "RepublishPeriod": "",
    "RecordLifetime": "",
    "ResolveCacheSize": 128
  },
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "Gateway": {
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "RootRedirect": "",
    "Writable": false,
    "PathPrefixes": [],
    "APICommands": [],
    "NoFetch": false,
    "NoDNSLink": false,
    "PublicGateways": null
  },
  "API": {
    "HTTPHeaders": {}
  },
  "Swarm": {
    "AddrFilters": null,
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "Transports": {
      "Network": {},
      "Security": {},
      "Multiplexers": {}
    },
    "ConnMgr": {
      "Type": "basic",
      "LowWater": 600,
      "HighWater": 900,
      "GracePeriod": "20s"
    }
  },
  "AutoNAT": {},
  "Pubsub": {
    "Router": "",
    "DisableSigning": false
  },
  "Peering": {
    "Peers": null
  },
  "DNS": {
    "Resolvers": {}
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Provider": {
    "Strategy": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Experimental": {
    "FilestoreEnabled": false,
    "UrlstoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "AcceleratedDHTClient": false
  },
  "Plugins": {
    "Plugins": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Internal": {}
}%

Description

We have ipfs deployed to k8s. We have deployed HPA to autoscale and have cluster autoscale to scale the nodes up and down based on the number of IPFS nodes.

The majority of the time everything works fine. However, sometimes the pods get into a bad state where they continuously restart. The reason that they fail is that the following liveness probe fails on the pod.

NAME                       READY   STATUS             RESTARTS   AGE
ipfs-ipfs-ipfs-cluster-0   0/1     Running            539        26h

    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /api/v0/version
        port: gateway
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

  Warning  Unhealthy  20m (x2395 over 26h)  kubelet  Readiness probe failed: Get "http://172.31.15.52:8080/api/v0/version": dial tcp 172.31.15.52:8080: connect: connection refused
  Warning  BackOff    18s (x6502 over 26h)  kubelet  Back-off restarting failed container

Looking at the log and comparing the logs from a bad pod to a good pod, I see that the good pod has the address set to 0.0.0.0 and the bad pod has it set to 127.0.0.1

Bad Pod
========
Changing user to ipfs
ipfs version 0.12.2
Found IPFS fs-repo at /data/ipfs
Initializing daemon...
go-ipfs version: 0.12.2-0e8b121
Repo version: 12
System version: amd64/linux
Golang version: go1.16.15
2022/05/25 17:32:24 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
Swarm listening on /ip4/127.0.0.1/tcp/4001
Swarm listening on /ip4/127.0.0.1/udp/4001/quic
Swarm listening on /ip4/172.31.15.52/tcp/4001
Swarm listening on /ip4/172.31.15.52/udp/4001/quic
Swarm listening on /p2p-circuit
Swarm announcing /ip4/127.0.0.1/tcp/4001
Swarm announcing /ip4/127.0.0.1/udp/4001/quic
Swarm announcing /ip4/172.31.15.52/tcp/4001
Swarm announcing /ip4/172.31.15.52/udp/4001/quic
API server listening on /ip4/127.0.0.1/tcp/5001
WebUI: http://127.0.0.1:5001/webui
Gateway (readonly) server listening on /ip4/127.0.0.1/tcp/8080
Daemon is ready

Received interrupt signal, shutting down...
(Hit ctrl-c again to force-shutdown the daemon.)

Good Pod
========
Changing user to ipfs
ipfs version 0.12.2
Found IPFS fs-repo at /data/ipfs
Initializing daemon...
go-ipfs version: 0.12.2-0e8b121
Repo version: 12
System version: amd64/linux
Golang version: go1.16.15
2022/05/25 03:25:06 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
Swarm listening on /ip4/127.0.0.1/tcp/4001
Swarm listening on /ip4/127.0.0.1/udp/4001/quic
Swarm listening on /ip4/172.31.22.219/tcp/4001
Swarm listening on /ip4/172.31.22.219/udp/4001/quic
Swarm listening on /p2p-circuit
Swarm announcing /ip4/127.0.0.1/tcp/4001
Swarm announcing /ip4/127.0.0.1/udp/4001/quic
Swarm announcing /ip4/172.31.22.219/tcp/4001
Swarm announcing /ip4/172.31.22.219/udp/4001/quic
API server listening on /ip4/0.0.0.0/tcp/5001
WebUI: http://0.0.0.0:5001/webui
Gateway (readonly) server listening on /ip4/0.0.0.0/tcp/8080
Daemon is ready

Looking at their configurations, I can confirm that Configuration is invalid

Bad Pod
========
  "Addresses": {
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ],
    "Announce": [],
    "AppendAnnounce": [],
    "NoAnnounce": [],
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Gateway": "/ip4/127.0.0.1/tcp/8080"
  },

Good Pod
========
  "Addresses": {
    "API": "/ip4/0.0.0.0/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/0.0.0.0/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ]
  },

From my research, the address value of API and Gateway does change as part of the docker startup. It only runs the first time the data directory is empty. I think that is the issue. If the pod is in the middle of initialization and doesn't get passed the initialization completely, it will leave the configuration in a broken state. Instead should the script change to something like

if [ -e "$repo/config" ]; then
  echo "Found IPFS fs-repo at $repo"
else
  ipfs init ${IPFS_PROFILE:+"--profile=$IPFS_PROFILE"}

  # Set up the swarm key, if provided
  SWARM_KEY_FILE="$repo/swarm.key"
  SWARM_KEY_PERM=0400

  # Create a swarm key from a given environment variable
  if [ -n "$IPFS_SWARM_KEY" ] ; then
    echo "Copying swarm key from variable..."
    printf "%s\n" "$IPFS_SWARM_KEY" >"$SWARM_KEY_FILE" || exit 1
    chmod $SWARM_KEY_PERM "$SWARM_KEY_FILE"
  fi

  # Unset the swarm key variable
  unset IPFS_SWARM_KEY

  # Check during initialization if a swarm key was provided and
  # copy it to the ipfs directory with the right permissions
  # WARNING: This will replace the swarm key if it exists
  if [ -n "$IPFS_SWARM_KEY_FILE" ] ; then
    echo "Copying swarm key from file..."
    install -m $SWARM_KEY_PERM "$IPFS_SWARM_KEY_FILE" "$SWARM_KEY_FILE" || exit 1
  fi

  # Unset the swarm key file variable
  unset IPFS_SWARM_KEY_FILE
fi

# Always ensure the addresses are set correctly 
ipfs config Addresses.API /ip4/0.0.0.0/tcp/5001
ipfs config Addresses.Gateway /ip4/0.0.0.0/tcp/8080

...

May 25 '22 22:05 yznima

If you need a quick fix, Kubo 0.13 shipped with support for customizing docker init via user scripts placed in /container-init.d – see https://docs.ipfs.tech/how-to/run-ipfs-inside-docker/#customizing-your-node

Unsure if we should be overwriting user config every time a node starts. In my mind it should happen only on init, and then every config update should be an explicit opt-in made by user.

In other words, is /container-init.d enough? Can we close this?

Aug 02 '22 17:08 lidel