ray icon indicating copy to clipboard operation
ray copied to clipboard

[Serve] log deployment config in controller logs

Open KeeProMise opened this issue 3 weeks ago • 3 comments

Description

[Serve] log deployment config in controller logs

Related issues

https://github.com/ray-project/ray/issues/59167

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

KeeProMise avatar Dec 06 '25 15:12 KeeProMise

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version

KeeProMise avatar Dec 09 '25 03:12 KeeProMise

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version

right, i think you are capturing the important detail in your comment. I am wondering why the change is not as simple as

diff --git a/python/ray/serve/_private/config.py b/python/ray/serve/_private/config.py
index 22546f6188..efc8e6c468 100644
--- a/python/ray/serve/_private/config.py
+++ b/python/ray/serve/_private/config.py
@@ -806,7 +806,16 @@ class ReplicaConfig:

     def to_proto_bytes(self):
         return self.to_proto().SerializeToString()
-
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_def_name": self.deployment_def_name,
+            "ray_actor_options": json.loads(self.ray_actor_options),
+            "placement_group_bundles": json.loads(self.placement_group_bundles),
+            "placement_group_strategy": self.placement_group_strategy,
+            "max_replicas_per_node": self.max_replicas_per_node,
+        }

 def prepare_imperative_http_options(
     proxy_location: Union[None, str, ProxyLocation],
diff --git a/python/ray/serve/_private/deployment_info.py b/python/ray/serve/_private/deployment_info.py
index 5413c7878a..1e4e824368 100644
--- a/python/ray/serve/_private/deployment_info.py
+++ b/python/ray/serve/_private/deployment_info.py
@@ -167,3 +167,14 @@ class DeploymentInfo:
         else:
             data["target_capacity_direction"] = self.target_capacity_direction.name
         return DeploymentInfoProto(**data)
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_config": self.deployment_config.model_dump(),
+            "replica_config": self.replica_config.to_dict(),
+            "start_time_ms": self.start_time_ms,
+            "actor_name": self.actor_name,
+            "version": self.version,
+            "end_time_ms": self.end_time_ms,
+        }
\ No newline at end of file
diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py
index d1e29e9f08..d507799987 100644
--- a/python/ray/serve/_private/deployment_state.py
+++ b/python/ray/serve/_private/deployment_state.py
@@ -2340,6 +2340,9 @@ class DeploymentState:
             bool: Whether the target state has changed.
         """

+        logger.info(f"Deploying deployment {deployment_info.to_dict()}")
+        logger.info(f"Current target state: {self._target_state.info.to_dict() if self._target_state.info is not None else None}")
+
         curr_deployment_info = self._target_state.info
         if curr_deployment_info is not None:
             # Redeploying should not reset the deployment's start time.

abrarsheikh avatar Dec 09 '25 20:12 abrarsheikh

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version

right, i think you are capturing the important detail in your comment. I am wondering why the change is not as simple as

diff --git a/python/ray/serve/_private/config.py b/python/ray/serve/_private/config.py
index 22546f6188..efc8e6c468 100644
--- a/python/ray/serve/_private/config.py
+++ b/python/ray/serve/_private/config.py
@@ -806,7 +806,16 @@ class ReplicaConfig:

     def to_proto_bytes(self):
         return self.to_proto().SerializeToString()
-
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_def_name": self.deployment_def_name,
+            "ray_actor_options": json.loads(self.ray_actor_options),
+            "placement_group_bundles": json.loads(self.placement_group_bundles),
+            "placement_group_strategy": self.placement_group_strategy,
+            "max_replicas_per_node": self.max_replicas_per_node,
+        }

 def prepare_imperative_http_options(
     proxy_location: Union[None, str, ProxyLocation],
diff --git a/python/ray/serve/_private/deployment_info.py b/python/ray/serve/_private/deployment_info.py
index 5413c7878a..1e4e824368 100644
--- a/python/ray/serve/_private/deployment_info.py
+++ b/python/ray/serve/_private/deployment_info.py
@@ -167,3 +167,14 @@ class DeploymentInfo:
         else:
             data["target_capacity_direction"] = self.target_capacity_direction.name
         return DeploymentInfoProto(**data)
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_config": self.deployment_config.model_dump(),
+            "replica_config": self.replica_config.to_dict(),
+            "start_time_ms": self.start_time_ms,
+            "actor_name": self.actor_name,
+            "version": self.version,
+            "end_time_ms": self.end_time_ms,
+        }
\ No newline at end of file
diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py
index d1e29e9f08..d507799987 100644
--- a/python/ray/serve/_private/deployment_state.py
+++ b/python/ray/serve/_private/deployment_state.py
@@ -2340,6 +2340,9 @@ class DeploymentState:
             bool: Whether the target state has changed.
         """

+        logger.info(f"Deploying deployment {deployment_info.to_dict()}")
+        logger.info(f"Current target state: {self._target_state.info.to_dict() if self._target_state.info is not None else None}")
+
         curr_deployment_info = self._target_state.info
         if curr_deployment_info is not None:
             # Redeploying should not reset the deployment's start time.

Hi @abrarsheikh Thank you for taking the time to review. PTAL.

KeeProMise avatar Dec 10 '25 03:12 KeeProMise

tests are failing.

abrarsheikh avatar Dec 11 '25 01:12 abrarsheikh

please add screenshot of manual test after this change.

abrarsheikh avatar Dec 15 '25 17:12 abrarsheikh

please add screenshot of manual test after this change.

image

KeeProMise avatar Dec 16 '25 07:12 KeeProMise

@abrarsheikh Thank you for reviewing and merging this PR.

KeeProMise avatar Dec 17 '25 02:12 KeeProMise

@KeeProMise message me on slack if you are interested in making other contributions :)

abrarsheikh avatar Dec 17 '25 05:12 abrarsheikh