runner-images icon indicating copy to clipboard operation
runner-images copied to clipboard

Node.js proxy connections time out.

Open jlillywhite opened this issue 7 months ago • 12 comments

Description

Our tests use a proxy server to connect to a site on the public internet. Starting on 2025-05-14T19:23:15, those tests started to fail with network connection errors. The error messages look like this:

AggregateError [ETIMEDOUT]: 
    at internalConnectMultiple (node:net:1139:18)
    at internalConnectMultiple (node:net:1139:18)
    at internalConnectMultiple (node:net:1215:5)
    at Timeout.internalConnectMultipleTimeout (node:net:1741:5)
    at listOnTimeout (node:internal/timers:590:11)
    at process.processTimers (node:internal/timers:523:7)
    at internalConnectMultiple (node:net:1215:5)
    at Timeout.internalConnectMultipleTimeout (node:net:1741:5)
    at listOnTimeout (node:internal/timers:590:11)
    at process.processTimers (node:internal/timers:523:7) {
  code: 'ETIMEDOUT',
  [errors]: [
    Error: connect ETIMEDOUT XXX.XXX.XXX.XXX:443
        at createConnectionError (node:net:1677:14)
        at Timeout.internalConnectMultipleTimeout (node:net:1736:38)
        at listOnTimeout (node:internal/timers:590:11)
        at process.processTimers (node:internal/timers:523:7) {
      errno: -110,
      code: 'ETIMEDOUT',
      syscall: 'connect',
      address: 'XXX.XXX.XXX.XXX',
      port: 443
    },
    Error: connect ENETUNREACH [260](https://github.com/. . . )X:XXXX::XXXX:XXXX:443 - Local (:::0)
        at internalConnectMultiple (node:net:1211:16)
        at Timeout.internalConnectMultipleTimeout (node:net:1741:5)
        at listOnTimeout (node:internal/timers:590:11)
        at process.processTimers (node:internal/timers:523:7) {
      errno: -101,
      code: 'ENETUNREACH',
      syscall: 'connect',
      address: 'XXXX:XXXXX::XXXX:XXX',
      port: 443
    },
    Error: connect ETIMEDOUT XXX.XXX.XXX.XXX:443
        at createConnectionError (node:net:1677:14)
        at Timeout.internalConnectMultipleTimeout (node:net:1736:38)
        at listOnTimeout (node:internal/timers:590:11)
        at process.processTimers (node:internal/timers:523:7) {
      errno: -110,
      code: 'ETIMEDOUT',
      syscall: 'connect',
      address: 'XXX.XXX.XXX.XXX',
      port: 443
    },
    Error: connect ENETUNREACH XXXX:XXXX::XXXX:XXX:443 - Local (:::0)
        at internalConnectMultiple (node:net:1211:16)
        at Timeout.internalConnectMultipleTimeout (node:net:1741:5)
        at listOnTimeout (node:internal/timers:590:11)
        at process.processTimers (node:internal/timers:523:7) {
      errno: -101,
      code: 'ENETUNREACH',
      syscall: 'connect',
      address: 'XXXX:XXXX::XXXX:[297](https://github.com/....)',
      port: 443
    }
  ]
}

Platforms affected

  • [ ] Azure DevOps
  • [x] GitHub Actions - Standard Runners
  • [ ] GitHub Actions - Larger Runners

Runner images affected

  • [x] Ubuntu 22.04
  • [x] Ubuntu 24.04
  • [ ] macOS 13
  • [ ] macOS 13 Arm64
  • [ ] macOS 14
  • [ ] macOS 14 Arm64
  • [ ] macOS 15
  • [ ] macOS 15 Arm64
  • [ ] Windows Server 2019
  • [ ] Windows Server 2022
  • [ ] Windows Server 2025

Image version and build link

Image: ubuntu-24.04 Version: 20250511.1.0

We have also tried this on

Image: ubuntu-22.04 Version: 20250511.1.0

This is happening on a private repository

Is it regression?

No

Expected behavior

Attempts to proxy requests to public website should succeed.

Actual behavior

Attempts to proxy requests to public website time out.

Repro steps

  1. Run github actions that use vite-proxy to access public resources.
  2. Proxy requests time out.

jlillywhite avatar May 16 '25 22:05 jlillywhite

This also failed with Image: ubuntu-24.04 Version: 20250427.1.0

jlillywhite avatar May 16 '25 22:05 jlillywhite

Hi @jlillywhite ,Thank you for bringing this issue to our attention. We will look into this issue and will update you after investigating.


vidyasagarnimmagaddi avatar May 19 '25 04:05 vidyasagarnimmagaddi

Hi @jlillywhite Could you please share the repro steps or workflow where you are facing the issue. It will help us debug the problem more effectively. Thanks.

RaviAkshintala avatar May 19 '25 10:05 RaviAkshintala

Here is the workflow where we're seeing the issue. The step that is failing is the "Integration Tests" step. It starts a web server and a proxy server using vite preview. The proxy server is failing to get assets from our public web server

name: "Build and Test Web Application"

env:
  NODE_VERSION: 22

on:
  workflow_call:
    inputs:
      application-name:
        description: Name of the application, must match the Sonar project key, 
        type: string
        required: true
      application-path:
        type: string
        required: true
      event-name:
        type: string
        required: true
      testing-shards:
        type: string
        required: false
      publish-image:
        type: boolean
        required: true
      github-token:
        type: string
        description: "The access token used by the current workflow"
      parent-workflow-run-id:
        type: string
      deploy-to-dev:
        type: boolean
        required: true
      sonar-scan:
        description: Whether Sonar Scan is enabled for the application
        type: boolean
        required: false
      self-hosted-runner-type:
        type: string
        required: false
        default: "md-arm64"
      slack-owner:
        description: The slack group to notify
        type: string
        required: false

jobs:
  static-checks:
    name: Static Checks
    runs-on:
      group: ${{ inputs.self-hosted-runner-type }}
    timeout-minutes: 15
    steps:
      - name: Checkout Latest Code
        uses: actions/checkout@v4

      - name: Static Checks
        uses: ./.github/actions/static-checks
        with:
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          app-path: ${{ inputs.application-path }}
          node-version: ${{ env.NODE_VERSION }}
          github-token: ${{ inputs.github-token }}

  type-checks:
    name: Type Checks
    runs-on:
      group: ${{ inputs.self-hosted-runner-type }}
    timeout-minutes: 15
    steps:
      - name: Checkout Latest Code
        uses: actions/checkout@v4
      - name: Type Checks
        uses: ./.github/actions/type-checks
        with:
          app-path: ${{ inputs.application-path }}
          node-version: ${{ env.NODE_VERSION }}
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          github-token: ${{ inputs.github-token }}

  test-unit:
    name: Unit Test
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - name: Checkout Latest Code
        uses: actions/checkout@v4
      - name: Unit Test
        uses: ./.github/actions/unit-test
        with:
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          app-path: ${{ inputs.application-path }}
          app-name: ${{ inputs.application-name }}
          node-version: ${{ env.NODE_VERSION }}
          github-token: ${{ inputs.github-token }}
          coverage: ${{ inputs.sonar-scan }}
      - name: Upload JUnit report to GitHub Actions Artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: ${{ inputs.application-name }}-ui-unit-test-results
          overwrite: true
          include-hidden-files: true
          path: "${{ inputs.application-path }}/out/playwright-report"

      - name: Publish JUnit Test Report
        uses: ./.github/actions/junit-test-report
        if: always() # always run even if the previous step fails
        with:
          report_path: "${{ inputs.application-path }}/out/junit-report.xml"
          include_passed: true

  test-integration-build:
    name: Build for Integration Test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Latest Code
        uses: actions/checkout@v4
        with:
          lfs: true

      - name: Install dependencies
        uses: ./.github/actions/install-dependencies
        id: install-deps
        with:
          node-version: ${{ env.NODE_VERSION }}
          app-path: ${{ inputs.application-path }}
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          github-token: ${{ secrets.GITHUB_TOKEN }}

      - name: Build for Integration Testing
        run: pnpm build ${{ inputs.application-name }} --required --verbose --mode=integration

      - name: Upload Integration Testing Build artifact
        if: inputs.testing-shards
        uses: actions/upload-artifact@v4
        with:
          name: ${{ inputs.application-name }}-web-build-integration-test-${{ github.run_id }}
          overwrite: true
          # Upload all web application build outputs
          # NOTE: Exclude Apryse as it's HUGE - we may need some special handling
          # for tests that require Apryse
          path: |
            apps/*/web/dist
            support/*/web/dist
            !apps/*/web/dist/apryse-webviewer
          retention-days: 3

  test-integration:
    if: inputs.testing-shards
    name: Integration Tests
    needs: test-integration-build
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: ${{ fromJson(inputs.testing-shards) }}
    timeout-minutes: 60
    steps:
      - name: Checkout Latest Code
        uses: actions/checkout@v4
        with:
          lfs: true

      - name: Install dependencies
        uses: ./.github/actions/install-dependencies
        id: install-deps
        with:
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          node-version: ${{ env.NODE_VERSION }}
          app-path: ${{ inputs.application-path }}

      - name: Download Integration Testing Build artifact
        uses: actions/download-artifact@v4
        with:
          name: ${{ inputs.application-name }}-web-build-integration-test-${{ github.run_id }}
          path: temp-full-build-output

      - name: Merge build output into existing files
        run: cp -r temp-full-build-output/apps/. apps/

      - name: Move Apryse static assets into adapt-host copied build output
        run: cp -r apps/adapt-host/web/public/apryse-webviewer/. apps/adapt-host/web/dist/apryse-webviewer

      - name: Install Playwright
        if: steps.install-deps.cache-hit != 'true'
        run: |
          cd "${{ inputs.application-path }}"
          pnpm exec playwright install --with-deps chromium

      - name: Integration Tests
        run: pnpm test:integration ${{ inputs.application-name }} --shard="${{ matrix.shard }}" --verbose --ignore

      # Coverage report will be used by the Sonar Scan workflow
      - name: "Upload Coverage Report: Integration"
        if: inputs.sonar-scan
        uses: actions/upload-artifact@v4
        with:
          # Use a unique artifact name for each shard
          name: coverage-report-integration-${{ inputs.application-name }}-${{ strategy.job-index }}
          path: ${{ inputs.application-path }}/out/coverage/integration/lcov.info
          overwrite: true

      - uses: ./.github/actions/get-clean-blob-name
        id: clean-blob-name
        if: always()
        with:
          blob-name: playwright-${{inputs.application-name}}-blob-report-${{ matrix.shard }}

      - name: Upload JUnit report to GitHub Actions Artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: ${{inputs.application-name}}-integration-test-results-${{ strategy.job-index }}
          overwrite: true
          include-hidden-files: true
          path: "${{ inputs.application-path }}/out/playwright-report"

      - name: Publish JUnit Test Report
        uses: ./.github/actions/junit-test-report
        if: always() # always run even if the previous step fails
        with:
          report_path: "${{ inputs.application-path }}/out/playwright-report/*.xml"

      - name: Upload blob report to GitHub Actions Artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: ${{ steps.clean-blob-name.outputs.clean-blob-name }}
          overwrite: true
          include-hidden-files: true
          path: "${{ inputs.application-path }}/out/playwright-report/blob-report"
          retention-days: 1

      - name: Publish CTRF Test Report
        uses: ctrf-io/github-test-reporter@v1
        with:
          report-path: "${{ inputs.application-path }}/out/playwright-report/ctrf-report/*.json"
        if: always()

      - name: Upload CTRF reports to GitHub Actions Artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: ${{inputs.application-name}}-ctrf-test-results-${{ strategy.job-index }}
          overwrite: true
          include-hidden-files: true
          path: "${{ inputs.application-path }}/out/playwright-report/ctrf-report"

  merge-reports:
    # Merge reports after playwright tests, even if some shards have failed
    if: always() && inputs.testing-shards
    name: Merge Integration Test Reports
    needs: test-integration
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Latest Code
        uses: actions/checkout@v4
      - uses: ./.github/actions/merge-reports
        with:
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          app-path: ${{ inputs.application-path }}
          app-name: ${{ inputs.application-name }}
          blob-pattern: "playwright-${{inputs.application-name}}-blob-report-*"
          node-version: ${{ env.NODE_VERSION }}
          test-type: "integration"
          github-token: "${{ inputs.github-token }}"

  sonar-scan:
    name: Sonar Scan
    if: inputs.sonar-scan
    needs: [test-unit, test-integration]
    secrets: inherit
    uses: ./.github/workflows/sonar-scan.yml
    with:
      project-root: ${{ inputs.application-path }}
      project-key: ${{ inputs.application-name }}

  build-and-publish-image:
    if: inputs.publish-image
    name: Build Image
    runs-on: ubuntu-latest
    concurrency:
      group: build-and-publish-image-${{ github.workflow }}-${{ github.ref }}-${{ inputs.application-name }}
      cancel-in-progress: true
    timeout-minutes: 15
    outputs:
      version_tag: ${{steps.build-image.outputs.version}}

    steps:
      - name: Checkout Latest Code
        uses: actions/checkout@v4
        with:
          fetch-tags: true
      - uses: ./.github/actions/build-app-image
        id: build-image
        with:
          app-path: ${{ inputs.application-path }}
          app-name: ${{ inputs.application-name }}
          node-version: ${{ env.NODE_VERSION }}
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          github-token: "${{ inputs.github-token }}"

  get_elevated_github_token:
    if: (inputs.publish-image &&  inputs.deploy-to-dev)
    name: GitHub | Get Elevated Token
    uses: Company/github-actions-shared-workflows/.github/workflows/github-sudo.yml@main
    secrets: inherit
    with:
      app_name: workflow

  trigger-dev-release:
    if: (inputs.publish-image &&  inputs.deploy-to-dev)
    name: Trigger Dev Release
    runs-on: ubuntu-latest
    needs: [get_elevated_github_token, build-and-publish-image]
    steps:
      - name: Fetch Secrets from Vault
        id: secrets
        uses: hashicorp/[email protected]
        with:
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          method: approle
          secrets: |
            github/data/secrets/pgp_passphrase passphrase | PGP_PASSPHRASE ;

      - uses: Company/applications/.github/actions/trigger-dev-release-frontend@main
        with:
          app-name: "${{ inputs.application-name }}"
          user: ${{ needs.get_elevated_github_token.outputs.user }}
          token: ${{ needs.get_elevated_github_token.outputs.token }}
          version-tag: ${{ needs.build-and-publish-image.outputs.version_tag }}
          pgp-passphrase: ${{ steps.secrets.outputs.PGP_PASSPHRASE }}

  notify-slack-on-failure:
    name: Notify Slack on Failure
    if: (github.ref == 'refs/heads/main' && failure())
    runs-on: ubuntu-latest
    needs: [static-checks, type-checks, test-unit, test-integration, wiz-checks]
    steps:
      - name: Checkout Latest Code
        uses: actions/checkout@v4
      - name: Get Slack Channel Webhook URL
        id: vault
        uses: hashicorp/[email protected]
        with:
          roleId: ${{ secrets.VAULT_ROLE_ID }}
          secretId: ${{ secrets.VAULT_SECRET_ID }}
          url: ${{ secrets.VAULT_ADDR }}
          method: approle
          secrets: github/data/service/slack FRONTEND_ENGINEERING_WEBHOOK | WEBHOOK_URL ;
      - name: Notify slack of failures in main
        uses: ./.github/actions/slack-frontend-notifications
        with:
          appName: ${{ inputs.application-name }}
          mentionGroups: ${{ inputs.slack-owner }}
          webhook_url: ${{ steps.vault.outputs.WEBHOOK_URL }}
          jobStatus: failure

jlillywhite avatar May 19 '25 14:05 jlillywhite

@jlillywhite Could you please send us the both successful and unsuccessful build attempts. So that we can investigate this issue further. Thanks.

RaviAkshintala avatar May 21 '25 05:05 RaviAkshintala

I'm attaching the logs.

FailingTestLogs.txt SuccessfulTestLogs.txt

jlillywhite avatar May 21 '25 15:05 jlillywhite

@jlillywhite We attempted to reproduce the issue using the steps you provided, but it passed successfully. Kindly find the link below. Thanks. https://github.com/RaviAkshintala/runner-images-AR/actions/runs/15185097640/job/42703555384 https://github.com/RaviAkshintala/runner-images-AR/blob/nodejs/vite.config.js

RaviAkshintala avatar May 22 '25 11:05 RaviAkshintala

Hi @jlillywhite Could you please review the above and confirm whether your issue has been resolved or not. Thanks.

RaviAkshintala avatar May 27 '25 09:05 RaviAkshintala

@RaviAkshintala, I'm not seeing any improvement from our side. We're still seeing the same errors. The connections don't appear to be failing every time, but they do continue to fail. I'm attaching logs from my most recent run.

FailingLogs.txt

jlillywhite avatar May 27 '25 16:05 jlillywhite

@jlillywhite The new Image version of 20250527.1 ubuntu has been rolled out. Could you please run your workflows and confirm with us. Thanks.

RaviAkshintala avatar May 30 '25 10:05 RaviAkshintala

I'm still getting the 20250511.1.0 image version for my workflows. I'll try running them again on Monday and let you know.

jlillywhite avatar May 30 '25 20:05 jlillywhite

The 20250511.1.0 image is still being used on workflows. Is there something I should be doing to pick up the new image?

jlillywhite avatar Jun 02 '25 15:06 jlillywhite

Hi @jlillywhite - Could you please check if the issue is resolved?

subir0071 avatar Jun 10 '25 15:06 subir0071

Hey @subir0071, I think this is related to https://github.com/nodejs/undici/issues/2777 I tried running the tests against the newest version, and there was no improvement, so I tried disabling IPV6 like it mentions in https://github.com/nodejs/undici/issues/2777#issuecomment-2680103085, and it fixed nearly all of the errors. Maybe this is all caused by slow DNS resolution?

jlillywhite avatar Jun 10 '25 19:06 jlillywhite

We have changed the way that we handle proxy connections, so this is no longer an issue.

jlillywhite avatar Jun 27 '25 15:06 jlillywhite