[Serve] log deployment config in controller logs
Description
[Serve] log deployment config in controller logs
Related issues
https://github.com/ray-project/ray/issues/59167
Additional information
Optional: Add implementation details, API changes, usage examples, screenshots, etc.
These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?
Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version
These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?
Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version
right, i think you are capturing the important detail in your comment. I am wondering why the change is not as simple as
diff --git a/python/ray/serve/_private/config.py b/python/ray/serve/_private/config.py
index 22546f6188..efc8e6c468 100644
--- a/python/ray/serve/_private/config.py
+++ b/python/ray/serve/_private/config.py
@@ -806,7 +806,16 @@ class ReplicaConfig:
def to_proto_bytes(self):
return self.to_proto().SerializeToString()
-
+
+ def to_dict(self):
+ # only use for logging purposes
+ return {
+ "deployment_def_name": self.deployment_def_name,
+ "ray_actor_options": json.loads(self.ray_actor_options),
+ "placement_group_bundles": json.loads(self.placement_group_bundles),
+ "placement_group_strategy": self.placement_group_strategy,
+ "max_replicas_per_node": self.max_replicas_per_node,
+ }
def prepare_imperative_http_options(
proxy_location: Union[None, str, ProxyLocation],
diff --git a/python/ray/serve/_private/deployment_info.py b/python/ray/serve/_private/deployment_info.py
index 5413c7878a..1e4e824368 100644
--- a/python/ray/serve/_private/deployment_info.py
+++ b/python/ray/serve/_private/deployment_info.py
@@ -167,3 +167,14 @@ class DeploymentInfo:
else:
data["target_capacity_direction"] = self.target_capacity_direction.name
return DeploymentInfoProto(**data)
+
+ def to_dict(self):
+ # only use for logging purposes
+ return {
+ "deployment_config": self.deployment_config.model_dump(),
+ "replica_config": self.replica_config.to_dict(),
+ "start_time_ms": self.start_time_ms,
+ "actor_name": self.actor_name,
+ "version": self.version,
+ "end_time_ms": self.end_time_ms,
+ }
\ No newline at end of file
diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py
index d1e29e9f08..d507799987 100644
--- a/python/ray/serve/_private/deployment_state.py
+++ b/python/ray/serve/_private/deployment_state.py
@@ -2340,6 +2340,9 @@ class DeploymentState:
bool: Whether the target state has changed.
"""
+ logger.info(f"Deploying deployment {deployment_info.to_dict()}")
+ logger.info(f"Current target state: {self._target_state.info.to_dict() if self._target_state.info is not None else None}")
+
curr_deployment_info = self._target_state.info
if curr_deployment_info is not None:
# Redeploying should not reset the deployment's start time.
These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?
Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version
right, i think you are capturing the important detail in your comment. I am wondering why the change is not as simple as
diff --git a/python/ray/serve/_private/config.py b/python/ray/serve/_private/config.py index 22546f6188..efc8e6c468 100644 --- a/python/ray/serve/_private/config.py +++ b/python/ray/serve/_private/config.py @@ -806,7 +806,16 @@ class ReplicaConfig: def to_proto_bytes(self): return self.to_proto().SerializeToString() - + + def to_dict(self): + # only use for logging purposes + return { + "deployment_def_name": self.deployment_def_name, + "ray_actor_options": json.loads(self.ray_actor_options), + "placement_group_bundles": json.loads(self.placement_group_bundles), + "placement_group_strategy": self.placement_group_strategy, + "max_replicas_per_node": self.max_replicas_per_node, + } def prepare_imperative_http_options( proxy_location: Union[None, str, ProxyLocation], diff --git a/python/ray/serve/_private/deployment_info.py b/python/ray/serve/_private/deployment_info.py index 5413c7878a..1e4e824368 100644 --- a/python/ray/serve/_private/deployment_info.py +++ b/python/ray/serve/_private/deployment_info.py @@ -167,3 +167,14 @@ class DeploymentInfo: else: data["target_capacity_direction"] = self.target_capacity_direction.name return DeploymentInfoProto(**data) + + def to_dict(self): + # only use for logging purposes + return { + "deployment_config": self.deployment_config.model_dump(), + "replica_config": self.replica_config.to_dict(), + "start_time_ms": self.start_time_ms, + "actor_name": self.actor_name, + "version": self.version, + "end_time_ms": self.end_time_ms, + } \ No newline at end of file diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py index d1e29e9f08..d507799987 100644 --- a/python/ray/serve/_private/deployment_state.py +++ b/python/ray/serve/_private/deployment_state.py @@ -2340,6 +2340,9 @@ class DeploymentState: bool: Whether the target state has changed. """ + logger.info(f"Deploying deployment {deployment_info.to_dict()}") + logger.info(f"Current target state: {self._target_state.info.to_dict() if self._target_state.info is not None else None}") + curr_deployment_info = self._target_state.info if curr_deployment_info is not None: # Redeploying should not reset the deployment's start time.
Hi @abrarsheikh Thank you for taking the time to review. PTAL.
tests are failing.
please add screenshot of manual test after this change.
please add screenshot of manual test after this change.
@abrarsheikh Thank you for reviewing and merging this PR.
@KeeProMise message me on slack if you are interested in making other contributions :)