terraform-provider-iterative icon indicating copy to clipboard operation
terraform-provider-iterative copied to clipboard

fail to run cml-launch

Open yili-han-86 opened this issue 4 months ago • 0 comments

I have a github workflow, partial of the code is like this:

deploy-runner:
    runs-on: ubuntu-22.04
    container:
      image: docker://iterativeai/cml:0-dvc3-base1
    steps:
      - uses: actions/checkout@v4
      - name: Deploy runner on cloud service
        shell: bash {0}
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS_DATA }}
        run: |
          IFS=',' read -r -a REGION_ARRAY <<< "$(echo $CLOUD_REGIONS | sed 's/ *, */,/g' )"
          SUCCESS=false
          for REGION in "${REGION_ARRAY[@]}"; do
            echo "Trying region $REGION"
            cml runner launch \
              $(if $CLOUD_SPOT; then echo "--cloud-spot"; fi) \
              --cloud=$CLOUD_SERVICE \
              --cloud-permission-set=$CLOUD_SERVICE_ACCOUNT,scopes=cloud-platform \
              --cloud-region=$REGION \
              --cloud-type=$CLOUD_TYPE \
              --cloud-hdd-size=$CLOUD_HDD_SIZE \
              --labels=eval-and-test-${{ inputs.branch }}-${{ inputs.model_env }} \
              --idle-timeout=$CLOUD_IDLE_TIMEOUT
            if [ $? -eq 0 ]; then
              echo "Successful in $REGION"
              SUCCESS=true
              break
            else
              echo "$REGION failed"
            fi
          done
          if [ "$SUCCESS" = false ]; then
            echo "All regions failed"
            exit 1
          fi

I kept facing this error:

***"level":"info","message":"iterative_cml_runner.runner: Creation errored after 19m53s"***
***"level":"error","message":"terraform error: Error: Error checking the runner status"***
2025-06-10T01:32:08.074Z [INFO]  provider: configuring client automatic mTLS
2025-06-10T01:32:08.089Z [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative args=[".terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative"]
2025-06-10T01:32:08.089Z [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative pid=166
2025-06-10T01:32:08.089Z [DEBUG] provider: waiting for RPC address: plugin=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative
2025-06-10T01:32:08.113Z [INFO]  provider.terraform-provider-iterative: configuring server automatic mTLS: timestamp=2025-06-10T01:32:08.113Z
2025-06-10T01:32:08.135Z [DEBUG] provider.terraform-provider-iterative: plugin address: address=/tmp/plugin2905256425 network=unix timestamp=2025-06-10T01:32:08.135Z
2025-06-10T01:32:08.135Z [DEBUG] provider: using plugin: version=5
2025-06-10T01:32:08.150Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2025-06-10T01:32:08.152Z [INFO]  provider: plugin process exited: plugin=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative id=166
2025-06-10T01:32:08.152Z [DEBUG] provider: plugin exited
2025-06-10T01:32:08.152Z [INFO]  provider: configuring client automatic mTLS
2025-06-10T01:32:08.159Z [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative args=[".terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative"]
2025-06-10T01:32:08.159Z [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative pid=[174](https://github.com/presien/training-pipeline/actions/runs/15548455626/job/43774361074#step:4:175)
2025-06-10T01:32:08.159Z [DEBUG] provider: waiting for RPC address: plugin=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative
2025-06-10T01:32:08.186Z [INFO]  provider.terraform-provider-iterative: configuring server automatic mTLS: timestamp=2025-06-10T01:32:08.186Z
2025-06-10T01:32:08.206Z [DEBUG] provider.terraform-provider-iterative: plugin address: address=/tmp/plugin363149033 network=unix timestamp=2025-06-10T01:32:08.206Z
2025-06-10T01:32:08.206Z [DEBUG] provider: using plugin: version=5
2025-06-10T01:32:08.222Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"

each dependencies version is below: Terraform v1.9.8 cml 0.20.6 terraform provider version: 0.11.20

I followed the suggestions here(https://github.com/iterative/cml/issues/1479) to change from ubuntu-latest to ubuntu-22.04, the error still exist.

how to fix the issue?

yili-han-86 avatar Jun 12 '25 06:06 yili-han-86