OpenWPM
OpenWPM copied to clipboard
Make the status_queue typed
Currently we're just throwing a bunch of tuples through the status queue which isn't very useful for arguing about what different events might be sent.
Looking at this code:
https://github.com/mozilla/OpenWPM/blob/c2a700448f31882bf75530eca98f536321b622e8/automation/TaskManager.py#L488-L519
and this code
https://github.com/mozilla/OpenWPM/blob/c2a700448f31882bf75530eca98f536321b622e8/automation/BrowserManager.py#L530-L567
you see that they are clearly tightly coupled but have need to trace every status[i]
back into the BrowserManager to find out what it might be.
This is unmaintainable and hard to reason about.
Instead each status should be it's own class, so you can access the fields using status.field
instead of status[i]
@Metropass This issue is currently lacking a lot of context, but the gist is: We currently pass around a lot of our failure status messages as tuples, and just assume on both sides that we'll have the same structure. This has proven brittle in the past, since we don't have any tests on these paths and don't catch errors until it crashes in production. The task would probably scope up to be slightly smaller than this https://github.com/mozilla/OpenWPM/pull/580 but similiar in style.
@Metropass This issue is currently lacking a lot of context, but the gist is: We currently pass around a lot of our failure status messages as tuples, and just assume on both sides that we'll have the same structure. This has proven brittle in the past, since we don't have any tests on these paths and don't catch errors until it crashes in production. The task would probably scope up to be slightly smaller than this #580 but similiar in style.
Interesting, if that's the case, then could we create a status
object that can store the tuple data itself, and just have a query function call within the queue for better management? @vringar I'll take a look at the code, and see if I can make a mock status
class.
Yeah, a single status object that looks something like this:
class Status:
def __init__(name: str,
error_class: Optional[type] = None,
error: Optional[BaseException] = None,
tb: Optional[traceback] = None
):
self.name = name
.....
There wouldn't need to be any accessor methods and for the "OK" status we could just leave everything but name
as None
.
error_class
, error
and tb
are returned by sys.exc_info()
I see. I also noticed that on line 554, you pickle the error tuple itself. Would it be a good approach to pass the tuple into the object's constructor, then pickle the object itself through pickle.dumps()
for stability?
https://github.com/mozilla/OpenWPM/blob/c2a700448f31882bf75530eca98f536321b622e8/automation/BrowserManager.py#L554
Yes, that seems like a good design
Cool, I'll have some code done by today. I would assume we would make a status_object.py
to hold the definition of the object, since we're using it in multiple files?
@vringar I got the definition of the object, I added, tb,error,error_class
from your example, and I also added logger_info
and error_text
just so you can store it in the Object to cache, rather than calling self.logger
every time.
I also added the destructor and a function called clean()
so that the object can be reused, or to flush any unused data.
https://github.com/Metropass/status_object
Is there any additions or changes I should make?
Thank you for following up so quickly on this.
Two questions on __del__
and on clean
because I'm currently don't understand why we need them:
- Isn't freeing of resources done automatically by the interpreter?
This SO answer implies the only use case for
__del__
is if you have external memory or other resources and that it's use is heavily discouraged otherwise - Where do you intend to reuse a status object? The way they are currently used is creating them in the browser processes and sending them over to TaskManager/main process. And since we are using pickle and not deserializing ourselves I don't think we can reuse existing allocations in either of those cases.
Also I prefer keeping the logging in the BrowserManager, since especially on critical errors the TaskManager might have died as well, but I'd still want the information to get out to the user.
Thank you for following up so quickly on this. Two questions on
__del__
and onclean
because I'm currently don't understand why we need them:* Isn't freeing of resources done automatically by the interpreter? [This SO answer](https://stackoverflow.com/a/43669054/7439718) implies the only use case for `__del__` is if you have external memory or other resources and that it's use is heavily discouraged otherwise * Where do you intend to reuse a status object? The way they are currently used is creating them in the browser processes and sending them over to TaskManager/main process. And since we are using pickle and not deserializing ourselves I don't think we can reuse existing allocations in either of those cases.
Also I prefer keeping the logging in the BrowserManager, since especially on critical errors the TaskManager might have died as well, but I'd still want the information to get out to the user.
You're right, the interpreter does free resources. I completely forgot about that. I'll remove that.
As for the clean()
I added it just in case if TaskManager ever gets revamped, and there's case where we need to flush the data from the object, we can call clean()
.
But if it's not necessary, I'll remove that as well.
@vringar I changed the object to only have setName
for the Status (case; "OK"), and the attributes that we need. I'll take a look at TaskManager and see how we could pickle the object :)
As for the
clean()
I added it just in case if TaskManager ever gets revamped, and there's case where we need to flush the data from the object, we can callclean()
.But if it's not necessary, I'll remove that as well.
If you find it useful in your current implementation, I have no objection to it, but otherwise I'd like to see it removed.
@vringar I changed the object to only have
setName
for the Status (case; "OK"), and the attributes that we need. I'll take a look at TaskManager and see how we could pickle the object :)
I don't think you need to worry about the pickling as the queue pickles objects internally. You should be able to just put the status object in there and get it out from the other side.
Also please feel free to open a PR as soon as you have something written up, so I can avoid wasting your time by giving feedback early.
@vringar So, I'm almost finished with this issue, however I'm kind of confused with Line 131 in BrowserManager.py
https://github.com/mozilla/OpenWPM/blob/c2a700448f31882bf75530eca98f536321b622e8/automation/BrowserManager.py#L129-L131
From what I've understood, the tuple's we worked with in TaskManager.py
is of type Tuple[str,str]
, but I'm unsure what the 3rd element of the tuple is.
Unfortunately I don't know either. My usual approach is writing:
self.logger.error("Result[2] has type %r and value %r", type(result[2]), result[2])
and just seeing, what results I get.
Hey, I'm sorry to say this but I completely underestimated the complexity in this change and don't the approach I originally suggested can work. I apologize for wasting your time and am hopeful that the second approach will work much better. My initial observation looked only at the "NETERROR", "FAILED" and "CRITCAL" and "OKAY" assumed that they could be all modelled in one class. However looking into it some more there is also a whole bunch of different "STATUS" messages that also get passed through there.
The class hierarchy that looks something like this:
- Status
- InfoStatus
- BrowserReadyStatus
- ProfileCreatedStatus
- ProfileTarStatus
- DisplayStatus
- LaunchAttemptStatus
- LaunchDoneStatus
- ErrorStatus
- FailedStatus
- CriticalStatus
- NetErrorStatus
- OkStatus
- InfoStatus
This way there will be no more need to deal with strings and the current differences between the different statuses can be easily kept.
Checks for which status it is turn into isinstance(status, FailedStatus)
.
Would you want to do this or would you rather I give you a half done PR that you can finish?
Hey, I'm sorry to say this but I completely underestimated the complexity in this change and don't the approach I originally suggested can work. I apologize for wasting your time and am hopeful that the second approach will work much better. My initial observation looked only at the "NETERROR", "FAILED" and "CRITCAL" and "OKAY" assumed that they could be all modelled in one class. However looking into it some more there is also a whole bunch of different "STATUS" messages that also get passed through there.
The class hierarchy that looks something like this:
* Status * InfoStatus * BrowserReadyStatus * ProfileCreatedStatus * ProfileTarStatus * DisplayStatus * LaunchAttemptStatus * LaunchDoneStatus * ErrorStatus * FailedStatus * CriticalStatus * NetErrorStatus * OkStatus
This way there will be no more need to deal with strings and the current differences between the different statuses can be easily kept. Checks for which status it is turn into
isinstance(status, FailedStatus)
. Would you want to do this or would you rather I give you a half done PR that you can finish?
Yeah, I'd love to do that! Thank you so much 😄
@vringar Apologies for the wait, I've been ill the past week, I should have my code done by Sunday, and I just wanted to let you know :)
No worries! Hope you're feeling better now.
Please be aware that I just merged a big PR that moved everything from automation
to openwpm
. So be prepared to adjust your code for that.
Sorry for the inconvenience, but it had to be done.