hatchet icon indicating copy to clipboard operation
hatchet copied to clipboard

Configuration Resets & Invalid Broadcast Address After Upgrading from v0.54.8 to v1 on K8s

Open nmetaintro opened this issue 9 months ago • 2 comments

Issue Description
After upgrading from v0.54.8 to v1 on Kubernetes (via helm upgrade), several configuration values were reset or changed unexpectedly:

  1. Default Token Reset

    • The default token was cleared, requiring reconfiguration.
  2. SERVER_URL in hatchet-shared-config

    • Reset to the default instead of preserving the prior value.
  3. SERVER_AUTH_COOKIE_DOMAIN

    • Was reset, impacting authentication flows.
  4. SERVER_GRPC_BROADCAST_ADDRESS

    • Changed from the expected service:7070 to localhost:7070, causing gRPC communication issues.

Because of these resets, authentication via both username/password and Google SSO was temporarily disrupted. After manually restoring each configuration variable to its intended value, the system now works correctly and I can see past workflow runs as expected.


Steps to Reproduce

  1. Run Helm upgrade from v0.54.8 to v1 on Kubernetes.
  2. Check the hatchet-shared-config and observe that SERVER_URL, SERVER_AUTH_COOKIE_DOMAIN, and other variables are reset.
  3. Notice that the authentication token is missing and SERVER_GRPC_BROADCAST_ADDRESS is set to localhost:7070.

Expected Behavior

  • Existing configuration values should be preserved across upgrades.
  • SERVER_GRPC_BROADCAST_ADDRESS should remain pointed to the actual service address, e.g., service:7070.
  • Authentication tokens and cookie domains should be kept intact without manual intervention.

Actual Behavior

  • Several critical config values (token, SERVER_URL, SERVER_AUTH_COOKIE_DOMAIN, and SERVER_GRPC_BROADCAST_ADDRESS) defaulted to incorrect or blank settings.

Workaround

  • Manually update the Helm chart (or directly update hatchet-shared-config) with the correct values:
    • Restore the default token.
    • Set SERVER_URL properly.
    • Set SERVER_AUTH_COOKIE_DOMAIN to the intended domain.
    • Change SERVER_GRPC_BROADCAST_ADDRESS back to service:7070.

Environment

  • Helm Chart Version: Upgraded from 0.54.8 to v1
  • Kubernetes Version: (please specify)
  • Authentication: Username/Password & Google SSO

Additional Context

After resetting the environment variables, everything is functioning correctly, including viewing past workflow runs. The issue seems specifically tied to default configurations overriding previous values during the upgrade process.

Please investigate whether the Helm chart upgrade path or config migration scripts might be resetting these variables unintentionally.

nmetaintro avatar Mar 21 '25 14:03 nmetaintro

For posterity, other config loading ~breakage between 0.54.14 and 0.55.21 discussed in discord: https://discord.com/channels/1088927970518909068/1213612885499052073/1352723656282869971

knksmith57 avatar Mar 21 '25 20:03 knksmith57

I wonder if https://github.com/hatchet-dev/hatchet/pull/1385 helped here

knksmith57 avatar Mar 31 '25 05:03 knksmith57

This issue has been stale for 30 days. Please update the issue or comment to keep it active. Otherwise, it will be closed in 5 days.

github-actions[bot] avatar Sep 08 '25 08:09 github-actions[bot]