assisted-service icon indicating copy to clipboard operation
assisted-service copied to clipboard

Logical bug - missing machine network

Open gtomilko opened this issue 11 months ago • 28 comments

When on the second step "Static network configurations" you switch from form view to yaml there is no place to enter Machine network information. As result you can not finish cluster configuration, get stuck on Networking with

Image

Just suggestion: add "Machine Network" field to "Use advanced networking"

Image

gtomilko avatar Mar 27 '25 17:03 gtomilko

Anybody alive on this project??

gtomilko avatar Apr 17 '25 18:04 gtomilko

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Jul 17 '25 01:07 openshift-bot

Sorry we have missed this issue. Where do you encounter this issue? How did you deploy assisted installer and its UI?

rccrdpccl avatar Jul 17 '25 08:07 rccrdpccl

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Aug 16 '25 08:08 openshift-bot

@rccrdpccl It was deployed in podman following instructions for http. When you configure static network using yaml there no place to specify machineNetwork

Image

as result when you get to Networking page in workflow

Image

gtomilko avatar Aug 28 '25 00:08 gtomilko

/remove-lifecycle rotten

gtomilko avatar Aug 28 '25 00:08 gtomilko

Anybody alive?

gtomilko avatar Sep 08 '25 18:09 gtomilko

@gtomilko Is this while running the latest version? Is it the upstream software or the RH distribution?

@rawagner is this issue something we are aware of?

rccrdpccl avatar Sep 09 '25 11:09 rccrdpccl

machine network dropdown is populated based on values of cluster's machineNetworks. Maybe none are reported ?

rawagner avatar Sep 09 '25 11:09 rawagner

@gtomilko please provide the following:

  • Type of deployment you are running:

    • SaaS on console.redhat.com
    • on-prem ACM/ZTP
    • podman
  • Version of the software (if you can try latest version it would be great)

Once you can provide this information we can choose the best path to solve your issue

rccrdpccl avatar Sep 09 '25 12:09 rccrdpccl

@rccrdpccl My installation is on-prem in podman. Here is About screen

Image

gtomilko avatar Sep 10 '25 17:09 gtomilko

Here is my configmap.yml

apiVersion: v1
kind: ConfigMap
metadata:
  name: config
data:
  ASSISTED_SERVICE_HOST: 127.0.0.1:8090
  ASSISTED_SERVICE_SCHEME: http
  AUTH_TYPE: none
  DB_HOST: 127.0.0.1
  DB_NAME: installer
  DB_PASS: admin
  DB_PORT: "5432"
  DB_USER: admin
  DEPLOY_TARGET: onprem
  DISK_ENCRYPTION_SUPPORT: "false"
  DUMMY_IGNITION: "false"
  ENABLE_SINGLE_NODE_DNSMASQ: "false"
  HW_VALIDATOR_REQUIREMENTS: '[{"version":"default","master":{"cpu_cores":4,"ram_mib":16384,"disk_size_gb":100,"installation_disk_speed_threshold_ms":10,"network_latency_threshold_ms":100,"packet_loss_percentage":0},"arbiter":{"cpu_cores":2,"ram_mib":8192,"disk_size_gb":100,"installation_disk_speed_threshold_ms":10,"network_latency_threshold_ms":1000,"packet_loss_percentage":0},"worker":{"cpu_cores":2,"ram_mib":8192,"disk_size_gb":100,"installation_disk_speed_threshold_ms":10,"network_latency_threshold_ms":1000,"packet_loss_percentage":10},"sno":{"cpu_cores":8,"ram_mib":16384,"disk_size_gb":100,"installation_disk_speed_threshold_ms":10},"edge-worker":{"cpu_cores":2,"ram_mib":8192,"disk_size_gb":15,"installation_disk_speed_threshold_ms":10}}]'
  IMAGE_SERVICE_BASE_URL: http://192.168.94.28:8888
  IPV6_SUPPORT: "true"
  ISO_IMAGE_TYPE: "full-iso"
  LISTEN_PORT: "8888"
  NTP_DEFAULT_SERVER: ""
  POSTGRESQL_DATABASE: installer
  POSTGRESQL_PASSWORD: admin
  POSTGRESQL_USER: admin
  PUBLIC_CONTAINER_REGISTRIES: 'quay.io,registry.ci.openshift.org'
  SERVICE_BASE_URL: http://192.168.94.28:8090
  STORAGE: filesystem
  OS_IMAGES: '[{"openshift_version":"4.16","cpu_architecture":"x86_64","url":"https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/39.20240322.3.1/x86_64/fedora-coreos-39.20240322.3.1-live.x86_64.iso","version":"39.20240322.3.1"},{"openshift_version":"4.19","cpu_architecture":"x86_64","url":"https://rhcos.mirror.openshift.com/art/storage/prod/streams/c9s/builds/9.0.20250510-0/x86_64/scos-9.0.20250510-0-live-iso.x86_64.iso","version":"9.0.20250510-0"}]'
  RELEASE_IMAGES: '[{"openshift_version":"4.16","cpu_architecture":"x86_64","cpu_architectures":["x86_64"],"url":"registry.ci.openshift.org/origin/release:4.16","version":"4.16.0-0.okd","default":false,"support_level":"beta"},{"openshift_version":"4.19","cpu_architecture":"x86_64","cpu_architectures":["x86_64"],"url":"quay.io/okd/scos-release:4.19.0-okd-scos.16","version":"4.19.0-okd-scos.16","default":true,"support_level":"beta"}]'
  ENABLE_UPGRADE_AGENT: "false"
  ENABLE_OKD_SUPPORT: "true"

gtomilko avatar Sep 10 '25 17:09 gtomilko

Here is podman version: Client: Podman Engine Version: 4.9.4-rhel API Version: 4.9.4-rhel Go Version: go1.23.9 (Red Hat 1.23.9-1.module+el8.10.0+23162+9223a61a) Built: Wed Jun 25 05:17:23 2025 OS/Arch: linux/amd64

gtomilko avatar Sep 10 '25 17:09 gtomilko

Thanks for this info. Can you please share the actual sha used instead of the latest tag, for the assisted components? Also, could you please share the static network configuration you have been using?

rccrdpccl avatar Sep 12 '25 10:09 rccrdpccl

Here is my static network config for one of the nodes:

interfaces:
  - name: eth0
    type: ethernet
    state: up
    mtu: 9000

  - name: eth1
    type: ethernet
    state: up
    mtu: 9000

  - name: bond0
    type: bond
    state: up
    mtu: 9000
    link-aggregation:
      mode: balance-xor
      options:
        miimon: '100'
      ports:
        - eth0
        - eth1
    ipv4:
      enabled: false
    ipv6:
      enabled: false

  - name: bond0.94
    type: vlan
    state: up
    mtu: 9000
    vlan:
      base-iface: bond0
      id: 94
    ipv4:
      enabled: false
    ipv6:
      enabled: false

  - name: br94
    type: linux-bridge
    state: up
    mtu: 9000
    bridge:
      options:
        stp:
          enabled: true
      port:
        - name: bond0.94
    ipv4:
      enabled: true
      address:
        - ip: 192.168.94.43
          prefix-length: 24
      auto-dns: false
      auto-gateway: false
            
    ipv6:
      enabled: false
dns-resolver:
  config:
    server:
      - 192.168.96.5
      - 192.168.98.6
    search:
      - our.tld

routes:
  config:
    - destination: 0.0.0.0/0
      next-hop-address: 192.168.94.1
      next-hop-interface: br94

gtomilko avatar Sep 15 '25 18:09 gtomilko

Here is info for UI container: "build-date"="2025-08-20T20:17:12" "architecture"="x86_64" "vcs-type"="git" "vcs-ref"="e52167a8a2facee6f75c9f6dbd8d406b720e3f60" "release"="1755720999"

gtomilko avatar Sep 15 '25 20:09 gtomilko

Thanks for the info. UI component is not the problem here, please share the image sha (can find it in the manifests/podman inspect )

rccrdpccl avatar Sep 17 '25 16:09 rccrdpccl

cc @linoyaslan @AlonaKaplan FYI

rccrdpccl avatar Sep 17 '25 16:09 rccrdpccl

Hi @gtomilko, You don’t actually need to enter the ‘machine network’ in the YAML view, it’s calculated based on the hosts interfaces, something like: extracts all addresses from all interfaces -> converts addresses to CIDR networks -> returns unique network CIDRs found across all hosts.

What I’m wondering is: do all your hosts share a common network?

linoyaslan avatar Sep 18 '25 07:09 linoyaslan

BTW, we're aware to the confusion the field "machine network" on the form view caused. Therefore we renamed it to "subnet". As it has nothing to do with the machine network that is not working for you - https://issues.redhat.com/browse/MGMT-20867

AlonaKaplan avatar Sep 18 '25 08:09 AlonaKaplan

@linoyaslan All hosts are on the same /24 subnet. IP addresses in yaml 192.168.94.[40-45]

gtomilko avatar Sep 18 '25 20:09 gtomilko

@rccrdpccl UI container: "Digest": "sha256:2889b78cf9c2363b448d04b9340e83d4cf255db9daa8865c440b3c61378cac5f" Service container: "Digest": "sha256:4cad9402b95162aa222f526b37f99f4d91c8411469a653f0c2bcbc1a54e64635" Image service container: "Digest": "sha256:403c040964f8ce591504b67d084acf2eec555fc1fd9949f712bba35aa9f65ae4"

gtomilko avatar Sep 18 '25 20:09 gtomilko

@gtomilko So, I spoke with @CrystalChun , and she shared with me the following: you should add ENABLE_VIRTUAL_INTERFACES: "true" to the config map you shared above, and it should work.

linoyaslan avatar Sep 18 '25 21:09 linoyaslan

Thank you @linoyaslan and @CrystalChun that brought me one step further. Now machine network shows up in picker and API and Ingress IP checks have no complains ))

Now I'm stuck on next step:

Image

In agent log I got:

Sep 22 18:55:49 k8s-okd0.our.tld next_step_runne[3377]: time="22-09-2025 18:55:49" level=info msg="Result for inventory already exists in assisted service" file="step_processor.go:64" request_id=1379ad98-ef9e-4e52-b278-aacd8c1b92d6
Sep 22 18:55:57 k8s-okd0.our.tld next_step_runne[3377]: time="22-09-2025 18:55:57" level=error msg="failed to ping address fde1:53ba:e9a0:de11:58ac:464:56cd:c69a" file="ping_checker.go:29" error="exit status 1"
Sep 22 18:55:57 k8s-okd0.our.tld next_step_runne[3377]: time="22-09-2025 18:55:57" level=error msg="failed to ping address fde1:53ba:e9a0:de11:acff:a808:af81:24cd" file="ping_checker.go:29" error="exit status 1"
Sep 22 18:55:57 k8s-okd0.our.tld next_step_runne[3377]: time="22-09-2025 18:55:57" level=error msg="failed to ping address fde1:53ba:e9a0:de11:60ec:b323:5aca:1e0e" file="ping_checker.go:29" error="exit status 1"
Sep 22 18:55:57 k8s-okd0.our.tld next_step_runne[3377]: time="22-09-2025 18:55:57" level=error msg="failed to ping address fde1:53ba:e9a0:de11:b3ac:518d:ae33:8559" file="ping_checker.go:29" error="exit status 1"
Sep 22 18:55:57 k8s-okd0.our.tld next_step_runne[3377]: time="22-09-2025 18:55:57" level=info msg="Sending step <connectivity-check-8fed726e> reply output <{\"remote_hosts\":[{\"host_id\":\"131caee0-daa3-7eb9-357a-d8f7ac879bf6\",\"l2_connectivity\":null,\"l3_connectivity\":[{\"average_rtt_ms\":0.043,\"remote_ip_address\":\"127.0.0.1\",\"successful\":true},{\"average_rtt_ms\":0.106,\"remote_ip_address\":\"192.168.94.47\",\"successful\":true},{\"average_rtt_ms\":0.041,\"remote_ip_address\":\"::1\",\"successful\":true},{\"remote_ip_address\":\"fde1:53ba:e9a0:de11:60ec:b323:5aca:1e0e\"}],\"mtu_report\":null},{\"host_id\":\"5913f30e-0069-eab7-7ce9-f4f5cdb301e7\",\"l2_connectivity\":null,\"l3_connectivity\":[{\"average_rtt_ms\":0.046,\"remote_ip_address\":\"127.0.0.1\",\"successful\":true},{\"average_rtt_ms\":0.1,\"remote_ip_address\":\"192.168.94.46\",\"successful\":true},{\"average_rtt_ms\":0.039,\"remote_ip_address\":\"::1\",\"successful\":true},{\"remote_ip_address\":\"fde1:53ba:e9a0:de11:58ac:464:56cd:c69a\"}],\"mtu_report\":null},{\"host_id\":\"798a4c68-8fd7-f117-810e-956a3ad26e61\",\"l2_connectivity\":null,\"l3_connectivity\":[{\"average_rtt_ms\":0.046,\"remote_ip_address\":\"127.0.0.1\",\"successful\":true},{\"average_rtt_ms\":0.182,\"remote_ip_address\":\"192.168.94.45\",\"successful\":true},{\"average_rtt_ms\":0.038,\"remote_ip_address\":\"::1\",\"successful\":true},{\"remote_ip_address\":\"fde1:53ba:e9a0:de11:b3ac:518d:ae33:8559\"}],\"mtu_report\":null},{\"host_id\":\"a066982c-cd87-38f8-1443-794963015eb9\",\"l2_connectivity\":null,\"l3_connectivity\":[{\"average_rtt_ms\":0.044,\"remote_ip_address\":\"127.0.0.1\",\"successful\":true},{\"average_rtt_ms\":0.22,\"remote_ip_address\":\"192.168.94.44\",\"successful\":true},{\"average_rtt_ms\":0.04,\"remote_ip_address\":\"::1\",\"successful\":true},{\"remote_ip_address\":\"fde1:53ba:e9a0:de11:acff:a808:af81:24cd\"}],\"mtu_report\":null}]}> error <> exit-code <0>" file="step_processor.go:76" request_id=1379ad98-ef9e-4e52-b278-aacd8c1b92d6

Current images: assisted-installer-ui - sha256:c9a9655867d84d5db19109af0a036a5a1cb53e25b8b6ab16af9089ecedb9a4df assisted-image-service - sha256:a761cbbb4d67dc6ecdcba0049302793540e31d9b8d3115e537770a1fd2bfa554 assisted-service - sha256:a757d4ddc46bc4104deb64a100b2d78a2ff30bfb2a538bcdcfcf18f31d873ab8

gtomilko avatar Sep 22 '25 19:09 gtomilko

@gtomilko It seems like there are connectivity issues between your hosts, as shown in the agent logs. Can you try pinging from one host to another and see if that works?

linoyaslan avatar Sep 25 '25 06:09 linoyaslan

@linoyaslan Yes, I can ping between nodes. Thous which fail ping are iDRAC interfaces on Dell servers. So, it seems like ping test uses wrong interfaces. As you can see in log pings to ip4 addresses are successful.

gtomilko avatar Sep 25 '25 17:09 gtomilko

@gtomilko The agent collects all interfaces and their respective IP addresses, then attempts to ping every other host across all interfaces. You could disable the specific host validations that are failing for you (though this is not recommended), or alternatively, try setting down the unused interfaces for the installation phase? @AlonaKaplan what do you think?

linoyaslan avatar Sep 25 '25 19:09 linoyaslan

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Dec 25 '25 01:12 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Jan 24 '26 08:01 openshift-bot