terraform-provider-equinix icon indicating copy to clipboard operation
terraform-provider-equinix copied to clipboard

404 when creating spot market request suddenly?

Open colemickens opened this issue 3 years ago • 6 comments

I'm getting: GET https://api.equinix.com/metal/v1/devices?include=facility: 404 Not found.

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # metal_spot_market_request.pktspotamd0 will be created
  + resource "metal_spot_market_request" "pktspotamd0" {
      + devices_max      = 1
      + devices_min      = 1
      + facilities       = (known after apply)
      + id               = (known after apply)
      + max_bid_price    = 0.5
      + metro            = "sv"
      + project_id       = "afc67974-ff22-41fd-9346-5b2c8d51e3a9"
      + wait_for_devices = true

      + instance_parameters {
          + always_pxe        = false
          + billing_cycle     = "hourly"
          + hostname          = "pktspotamd0"
          + operating_system  = "ubuntu_18_04"
          + plan              = "c3.medium.x86"
          + termintation_time = (known after apply)
          + userdata          = <<-EOT
                #!/usr/bin/env bash
                set -xeuo pipefail
                
                
                ##
                ##
                # ./userdata/install-nix.sh
                #!/usr/bin/env bash
                set -euo pipefail
                set -x
                
                USERNAME="cole"
                NIX_INSTALL_URL="https://github.com/numtide/nix-unstable-installer/releases/download/nix-2.5pre20211026_5667822/install"
                
                # TODO: support re-exec as root if we're not
                # check if we're not "cole" and if so, make it and then re-exec *again*
                
                if [[ "${1:-""}" != "stage2" ]]; then
                  if [[ "$(whoami)" != "${USERNAME}}" ]]; then
                    sudo adduser --gecos "" --disabled-password "${USERNAME}"
                    mkdir -p /home/"${USERNAME}"/.ssh
                    curl -L "https://github.com/colemickens.keys" > /home/cole/.ssh/authorized_keys
                    sudo chown -R cole /home/"${USERNAME}"/.ssh
                    sudo chmod -R ugo-w /home/"${USERNAME}"/.ssh
                    sudo chmod -R ugo+rx /home/"${USERNAME}"/.ssh
                    sudo chmod -R ugo-w /home/"${USERNAME}"/.ssh
                    sudo chmod -R u+rw /home/"${USERNAME}"/.ssh
                    sudo chmod u+x /home/"${USERNAME}"/.ssh
                    sudo usermod -aG sudo "${USERNAME}"
                    echo "%sudo   ALL=(ALL:ALL) NOPASSWD:ALL" | sudo tee -a /etc/sudoers
                    sudo cp "${0}" "/tmp/nix-unstable.sh"
                    sudo chmod ugo+rx "/tmp/nix-unstable.sh"
                    sudo -u "${USERNAME}" "/tmp/nix-unstable.sh" stage2
                  fi
                  exit 0
                fi
                
                # TODO: pull out extra subs/keys to TF var?
                # TODO: keep in sync: commbox.sh/install-nix.sh
                curl -L "${NIX_INSTALL_URL}" > /tmp/install
                sudo chmod +x /tmp/install
                /tmp/install --daemon &> /tmp/nix-install.log
                
                sudo mkdir -p "/etc/nix"
                cat <<EOF | sudo tee -a "/etc/nix/nix.conf"
                experimental-features = nix-command flakes ca-references
                extra-substituters = https://colemickens.cachix.org https://nixpkgs-wayland.cachix.org https://arm.cachix.org https://thefloweringash-armv7.cachix.org
                extra-trusted-public-keys = colemickens.cachix.org-1:bNrJ6FfMREB4bd4BOjEN85Niu8VcPdQe4F4KxVsb/I4= nixpkgs-wayland.cachix.org-1:3lwxaILxMRkVhehr5StQprHdEo4IrE8sRho9R9HOLYA= arm.cachix.org-1:5BZ2kjoL1q6nWhlnrbAl+G7ThY7+HaBRD9PZzqZkbnM= thefloweringash-armv7.cachix.org-1:v+5yzBD2odFKeXbmC+OPWVqx4WVoIVO6UXgnSAWFtso=
                trusted-users = root @sudo
                cores = 0
                max-jobs = auto
                EOF
                
                sudo systemctl restart nix-daemon
                
                BASHRC="$(cat "/etc/bash.bashrc")"
                NIXSNIPPET="$(cat "/etc/profile.d/nix.sh")"
                printf '%s\n#####\n%s' \
                  "${NIXSNIPPET}" \
                  "${BASHRC}" | sudo tee "/etc/bash.bashrc"
                
                source "/etc/profile.d/nix.sh"
                nix --version
                
                echo "install-nix: all done!"
                
            EOT
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

metal_spot_market_request.pktspotamd0: Creating...
╷
│ Error: Failed to fetch Device with following error: GET https://api.equinix.com/metal/v1/devices?include=facility: 404 Not found 
│ 
│   with metal_spot_market_request.pktspotamd0,
│   on config.tf.json line 1, in resource.metal_spot_market_request.pktspotamd0:
│    1: {"data":{"oci_identity_availability_domain":{"default_ad":[{"ad_number":1,"compartment_id":"ocid1.compartment.oc1..aaaaaaaafclyuqguzm2rtz5a5kcijxnjnidd4x3u35rwlivim6xuwuutzsta"}]}},"provider":{"metal":[null],"oci":[{"fingerprint":"d4:d8:ce:6c:c4:ca:b9:ab:11:ac:2a:1f:1b:e7:70:71","private_key_path":"/run/secrets/oraclecloud_colemickens_privkey","region":"us-phoenix-1","tenancy_ocid":"ocid1.tenancy.oc1..aaaaaaaafyqmgtgi5nwkolwjujayjrx5qw2qmzpbp7wzche2kgmdrlptnj4q","user_ocid":"ocid1.user.oc1..aaaaaaaah76dpd2bz6pqmy53t2p7mxy3wieydldjxshmnpe6nsoensqieulq"}]},"resource":{"metal_spot_market_request":{"pktspotamd0":{"devices_max":1,"devices_min":1,"instance_parameters":{"billing_cycle":"hourly","hostname":"pktspotamd0","operating_system":"ubuntu_18_04","plan":"c3.medium.x86","userdata":"${templatefile(\"/nix/store/78znmjqb5cnmgnv2i6yswjfgybv5qb4m-bootstrap.sh.tmpl\", { TF_NIXOS_LUSTRATE = \"false\", TF_NIX_INSTALL_URL = \"https://github.com/numtide/nix-unstable-installer/releases/download/nix-2.5pre20211026_5667822/install\", TF_USERNAME = \"cole\" })}"},"max_bid_price":"0.50","metro":"sv","project_id":"afc67974-ff22-41fd-9346-5b2c8d51e3a9","wait_for_devices":true}},"oci_core_default_route_table":{"default_route_table":[{"display_name":"DefaultRouteTable","manage_default_resource_id":"${oci_core_vcn.default_vcn.default_route_table_id}","route_rules":[{"destination":"0.0.0.0/0","destination_type":"CIDR_BLOCK","network_entity_id":"${oci_core_internet_gateway.default_internet_gateway.id}"}]}]},"oci_core_internet_gateway":{"default_internet_gateway":[{"compartment_id":"ocid1.compartment.oc1..aaaaaaaafclyuqguzm2rtz5a5kcijxnjnidd4x3u35rwlivim6xuwuutzsta","display_name":"DefaultInternetGateway","vcn_id":"${oci_core_vcn.default_vcn.id}"}]},"oci_core_subnet":{"default_subnet":[{"availability_domain":"${data.oci_identity_availability_domain.default_ad.name}","cidr_block":"10.0.1.0/24","compartment_id":"ocid1.compartment.oc1..aaaaaaaafclyuqguzm2rtz5a5kcijxnjnidd4x3u35rwlivim6xuwuutzsta","dhcp_options_id":"${oci_core_vcn.default_vcn.default_dhcp_options_id}","display_name":"DefaultSubnet","dns_label":"default","route_table_id":"${oci_core_vcn.default_vcn.default_route_table_id}","security_list_ids":["${oci_core_vcn.default_vcn.default_security_list_id}"],"vcn_id":"${oci_core_vcn.default_vcn.id}"}]},"oci_core_vcn":{"default_vcn":[{"cidr_block":"10.0.0.0/16","compartment_id":"ocid1.compartment.oc1..aaaaaaaafclyuqguzm2rtz5a5kcijxnjnidd4x3u35rwlivim6xuwuutzsta","display_name":"DefaultVcn","dns_label":"default"}]}},"terraform":{"required_providers":{"metal":{"source":"equinix/metal","version":"3.2.0"}}}}
│ 
╵
+ tixe

colemickens avatar Nov 10 '21 21:11 colemickens

It looks like the wait_for_devices is triggering a device fetch when the device id is not available: https://github.com/equinix/terraform-provider-metal/blob/ec6ec6f5daa5161cd2650e4b334bbc16f9653427/metal/resource_metal_spot_market_request.go#L429-L431

displague avatar Nov 11 '21 01:11 displague

I forgot, I actually have the logging infra in place from previous issues, I'll attach it. log.txt

colemickens avatar Nov 11 '21 02:11 colemickens

@colemickens thanks for supplyign the debug log and great that you redacted your API token!

There is a GET for a spot mark req:

2021-11-10T13:06:34.597-0800 [INFO]  provider.terraform-provider-metal_v3.2.0: 2021/11/10 13:06:34 [DEBUG] Equinix Metal API Request Details:
---[ REQUEST ]---------------------------------------
GET /metal/v1/spot-market-requests/4c603e72-e6cd-4c78-837e-5e69b88c7665?include=project%2Cdevices%2Cfacilities%2Cmetro HTTP/1.1
...

and the reply is

2021-11-10T13:06:35.227-0800 [INFO]  provider.terraform-provider-metal_v3.2.0: 2021/11/10 13:06:35 [DEBUG] Equinix Metal API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Content-Length: 3218
Cache-Control: max-age=0, private, must-revalidate
Content-Type: application/json; charset=utf-8
Date: Wed, 10 Nov 2021 21:06:35 GMT
Etag: W/"596f0204f08af0c34122fa6053f512a5"
Last-Modified: Wed, 10 Nov 2021 21:06:31 GMT
Strict-Transport-Security: max-age=15724800; includeSubDomains
X-Request-Id: d34bf9012c5cae0460084953175c7eb1

{
 "id": "4c603e72-e6cd-4c78-837e-5e69b88c7665",
 "created_at": "2021-11-10T21:06:31Z",
 "devices_min": 1,
 "devices_max": 1,
 "max_bid_price": 0.5,
 [...],
 "devices": [
  {}                       <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Not great
 ], 
 "href": "/metal/v1/spot-market-requests/4c603e72-e6cd-4c78-837e-5e69b88c7665"
}

.. there is an empty dict in the devices array. It's and API bug and packngo or the provider code are not ready for this. @displague will you please bring this up to the API devs?

If we assume that the empty device dict is just a temporary nuisance, and the device will appear, then this crash could be fixed in the SMR waiting code. we could assume that id the ID is empty, we need to wait more. @displague should I implement this?

t0mk avatar Nov 11 '21 13:11 t0mk

@t0mk I'm raising this to the API team. I don't think it is reasonable for packngo to try to work around this (at least until the nature of this bug is determined).

displague avatar Nov 15 '21 14:11 displague

Now I got a 500? But it's weird, the log shows it happening and the deployment continuing? But the 500 didn't pop up until the end when terraform gave up?

Log here: log.txt

colemickens avatar Nov 18 '21 23:11 colemickens

And then I tried again to finish my plan by deploying the second spot market request and now it immediately throws back 500 and fails.

:(

colemickens avatar Nov 18 '21 23:11 colemickens