[py][bidi]: add response handlers and network data collectors
Open
navin772
opened this issue 2 months ago
•
8 comments
User description
🔗 Related Issues
Addresses remaining methods for response handlers for python - https://github.com/SeleniumHQ/selenium/issues/13993
💥 What does this PR do?
Adds network response handlers:
add_response_handler
remove_response_handler
clear_response_handlers
Adds continue_response method.
Adds network data collector support
add_data_collector
remove_data_collector
get_data
🔧 Implementation Notes
Initially I started out implementing the data collector feature only, but during testing response handlers were required, thus this PR grew bigger.
💡 Additional Considerations
🔄 Types of changes
New feature (non-breaking change which adds functionality and tests!)
PR Type
Enhancement, Tests
Description
Adds network response handlers (add_response_handler, remove_response_handler, clear_response_handlers) for BiDi protocol
Implements continue_response method for modifying intercepted responses
Adds network data collector support (add_data_collector, remove_data_collector, get_data)
Introduces BytesValue, StringValue, and Base64Value classes for handling network data encoding
Adds comprehensive test coverage for response handlers, data collectors, and their integration
Diagram Walkthrough
flowchart LR
A["Network class"] --> B["Response handlers"]
A --> C["Data collectors"]
B --> D["Response class"]
D --> E["continue_response()"]
C --> F["BytesValue encoding"]
F --> G["StringValue/Base64Value"]
File Walkthrough
Relevant files
Enhancement
network.py
Add response handlers, data collectors, and encoding classes
py/selenium/webdriver/common/bidi/network.py
Adds BytesValue, StringValue, and Base64Value classes for network data encoding
Below is a summary of compliance checks for this PR:
Security Compliance
⚪
Potential DoS via memory
Description: The get_data method returns raw network body bytes without enforcing size limits or content-type checks, risking excessive memory usage if a large response is fetched or unexpectedly large data is stored. network.py [574-589]
Referred Code
if disown and collector_id is None:
raise ValueError("Cannot disown data without specifying a collector")
params = {
"dataType": data_type,
"request": request_id,
"disown": disown,
}
if collector_id is not None:
params["collector"] = collector_id
cmd = command_builder("network.getData", params)
result = self.conn.execute(cmd)
return BytesValue.from_dict(result["bytes"])
Sensitive data exposure
Description: Tests assert on collected response data that may contain HTML content; if logs or reports persist raw data this could expose sensitive content, suggesting need to sanitize or limit stored data in testing outputs. bidi_network_tests.py [196-233]
Referred Code
time.sleep(2)
assert len(responses) > 0
response = responses[0]
assert response.request_id is not None
assert "formPage.html" in response.url
assert response.status_code is not None
driver.network.remove_response_handler("response_completed", callback_id)
# Integrated Tests: Response Handlers + Data Collection
def test_data_collection_with_response_handler(driver, pages):
captured_responses = []
collected_data = []
# Add a data collector
collector_id = driver.network.add_data_collector(data_types=["response"], max_encoded_data_size=50000)
def response_callback(response: Response):
... (clipped 17 lines)
Remove the redundant and incorrect code in _on_response that corrupts the self.callbacks dictionary by using event_name instead of callback_id as the key.
-if event_name in self.callbacks:
- self.callbacks[event_name].append(callback_id)
-else:
- self.callbacks[event_name] = [callback_id]
+# This block should be removed.
[ ] Apply / Chat
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a critical bug where self.callbacks is corrupted, which would break functionality in other methods like remove_response_handler.
High
Unsubscribe from events when clearing handlers
In clear_request_handlers, send a session.unsubscribe command for each event after removing its local handlers to prevent resource leaks from the browser.
-for event_name in self.subscriptions:
+events_to_unsubscribe = list(self.subscriptions.keys())
+for event_name in events_to_unsubscribe:
for callback_id in self.subscriptions[event_name].copy():
net_event = NetworkEvent(event_name)
self.conn.remove_callback(net_event, callback_id)
if callback_id in self.callbacks:
self._remove_intercept(self.callbacks[callback_id])
del self.callbacks[callback_id]
-self.subscriptions = {}
+ self.conn.execute(command_builder("session.unsubscribe", {"events": [event_name]}))
+self.subscriptions.clear()
[ ] Apply / Chat
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly points out that the refactored clear_request_handlers method fails to unsubscribe from browser events, which is a regression that causes resource leaks.
Medium
Unsubscribe from events when removing handlers
In remove_response_handler, send a session.unsubscribe command when the last callback for an event is removed to prevent the browser from sending unnecessary data.
if event_name in self.subscriptions:
self.subscriptions[event_name].remove(callback_id)
if len(self.subscriptions[event_name]) == 0:
del self.subscriptions[event_name]
+ self.conn.execute(command_builder("session.unsubscribe", {"events": [event_name]}))
[ ] Apply / Chat
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a resource leak where the client does not unsubscribe from a browser event after the last handler is removed, improving resource management.
Medium
High-level
Simplify the data retrieval API
The get_data method should be simplified to return raw bytes instead of the complex BytesValue object. This change would create a more intuitive API by hiding the protocol's internal encoding details from the user.
def get_data(self, data_type: str, request_id: str, collector_id: str = None, disown: bool = False):
"""Retrieve network data for a request.
Parameters:
----------
data_type (str): The type of data to retrieve (e.g., "response").
request_id (str): The request id to get data for.
collector_id (str, optional): The collector id to use.
Default is None.
disown (bool, optional): Whether to disown the data from the collector.
... (clipped 25 lines)
# In network.py
class Network:
def get_data(...) -> BytesValue:
# ...
result = self.conn.execute(...)
return BytesValue.from_dict(result["bytes"])
# In user code (e.g., tests)
def response_callback(response: Response):
# User receives a complex BytesValue object
data_object = driver.network.get_data(...)
# User must know how to extract the actual data
body_string = data_object.value
body_bytes = data_object.to_bytes()
assert "<title>We Leave From Here</title>" in body_string
After:
# In network.py
class Network:
def get_data(...) -> bytes:
# ...
result = self.conn.execute(...)
# The conversion happens inside the method
bytes_value = BytesValue.from_dict(result["bytes"])
return bytes_value.to_bytes()
# In user code (e.g., tests)
def response_callback(response: Response):
# User receives raw bytes directly
body_bytes = driver.network.get_data(...)
# API is simpler and more intuitive
assert b"<title>We Leave From Here</title>" in body_bytes
Suggestion importance[1-10]: 8
__
Why: This is a strong API design suggestion that improves usability by abstracting away underlying protocol details, which is a key goal for a high-level library.
Medium
Learned best practice
Add safe removal guards
Guard removals by checking membership and normalize error types. Use membership checks before list.remove and raise ValueError for unknown events.
def remove_response_handler(self, event, callback_id):
- """Remove a response handler from the network.
-
- Parameters:
- ----------
- event (str): The event to unsubscribe from.
- callback_id (int): The callback id to remove.
- """
try:
event_name = self.EVENTS[event]
except KeyError:
- raise Exception(f"Event {event} not found")
+ raise ValueError(f"Unknown event: {event}")
net_event = NetworkEvent(event_name)
-
self.conn.remove_callback(net_event, callback_id)
- # Remove intercept if it was created for this handler
- if callback_id in self.callbacks and self.callbacks[callback_id] is not None:
- self._remove_intercept(self.callbacks[callback_id])
- del self.callbacks[callback_id]
+ intercept_id = self.callbacks.pop(callback_id, None)
+ if intercept_id:
+ self._remove_intercept(intercept_id)
- if event_name in self.subscriptions:
+ if event_name in self.subscriptions and callback_id in self.subscriptions[event_name]:
self.subscriptions[event_name].remove(callback_id)
- if len(self.subscriptions[event_name]) == 0:
+ if not self.subscriptions[event_name]:
del self.subscriptions[event_name]
[ ] Apply / Chat
Suggestion importance[1-10]: 6
__
Why:
Relevant best practice - Validate inputs and states early with precise checks to prevent logic errors, including guarding removal operations and mapping lookups.
Low
Add try/finally for cleanup
Ensure handlers and collectors are cleaned up even if assertions fail by wrapping registration in try/finally. This prevents leaks affecting subsequent tests.
Why:
Relevant best practice - Enforce deterministic and safe resource handling in tests by ensuring cleanup of handlers and collectors even on failure using try/finally.
In general we'll want a high level network class accessible directly from the driver, and a network module implementation class, in a bidi namespace, likely(?) private API. I think my issue here is that this class includes implementation details that need to be one level deeper. Everything calling out to sockets or converting things to spec syntax should be one level lower.
First we must agree on the high level API, then we can figure out how to split things.
@navin772 this is great work. I'll review it more closely soon. I am planning on building functionality for capturing HAR files, and this will be a key piece to enable that.