cobalt icon indicating copy to clipboard operation
cobalt copied to clipboard

Clean up telemetry_test and fix.

Open aee-google opened this issue 10 months ago • 1 comments

b/330355045

The fix includes fixing the starboard implementation of MessagePump: MessagePumpUIStarboard and MessagePumpIOStarboard.

aee-google avatar Apr 22 '24 20:04 aee-google

I tried this on a Sabrina device and it loaded the app but did not respond to remote keypresses. remote devtools worked, but the Quit button on the Cobalt app also got no response beyond a Request to stop log message (conceal worked, but revealing afterward by selecting the app again on the device results in a black screen and an ANR.

jellefoks avatar Apr 23 '24 00:04 jellefoks

Updated base/message_loop/message_pump_io_starboard.cc and base/message_loop/message_pump_ui_starboard.cc using https://github.com/youtube/cobalt/blob/c0dfb97eaf2742bd21f6054dae61a5328029046d/base/message_loop/message_pump_libevent.cc#L273 as a reference.

aee-google avatar Apr 26 '24 21:04 aee-google

The failed checks are due to infra issues.

C:\BuildTools\VC\Tools\MSVC\14.34.31933\include\xtree(44): fatal error C1088:
    Cannot flush compiler intermediate file:
        'C:\Users\ContainerAdministrator\AppData\Local\Temp2\_CL_3c56ab4bex': No space left on device

aee-google avatar Apr 26 '24 21:04 aee-google

Can you explain a little bit what's the original issue with the MessagePump and how do you fix it? It will be very helpful if you can provide more context here.

sherryzy avatar Apr 27 '24 00:04 sherryzy

The MessagePump::Delegate interface changed.

New:

Delegate::NextWorkInfo next_work_info = delegate->DoWork();

Old:

bool did_work = delegate->DoWork();
TimeDelta next_delayed_work_time;
did_work |= delegate->DoDelayedWork(&next_delayed_work_time);

The updated //base implementation simplifies the running of the next task into one call. If there is a delayed task that is ready to be done and and immediate task available, either task might be run in one iteration of the run loop. The DoWork() call returns the info about the next task to be run (unless preempted). This info is mainly held in next_work_info.delayed_run_time. NextWorkInfo also has next_work_info.recent_now which is used to determine the remaining delay in TimeDelta to wait until the next task is ready to be executed.

next_work_info.is_immediate() checks is delayed_run_time.is_null() which actually checks if the interval value of the TimeDelta is 0. A next_work_info.delayed_run_time with an internal value of 0 is an immediate task which should be executed immediately (no sleep/wait).

next_work_info.delayed_run_time.is_max() is a little clunky, but it means that there is no available next task, and we should idle until woken up.

next_work_info.remaining_delay() is the amount of time we should sleep/wait until running the next task. Similar to idle, we can still wake up if an immediate task is posted or if a new delayed task is posted with a more recent desired run time.

The old //base is different.

bool did_work = delegate->DoWork();
TimeDelta next_delayed_work_time;
did_work |= delegate->DoDelayedWork(&next_delayed_work_time);

If we did work, we should try to pull more work to do now. Another iteration should always be attempted.

We did update the MessagePump code to use the new Delegate::NextWorkInfo returned from DoWork(), but we using it subtly incorrectly.

I tried to fix just the issue. But the code as written was difficult to fix without breaking something else. This is what led me to modifying MessagePumpIOStarboard::Run() and MessagePumpUIStarboard::RunUntildle() to look more like MessagePumpLibevent::Run().

In addition to this, I changed SbEventIdSet outstanding_events_ to be absl::optional<SbEventId> outstanding_event_ since only one element is stored at a time. Same for outstanding_delayed_event_.

aee-google avatar Apr 27 '24 00:04 aee-google

Do I need to keep retrying the on_device checks that failed? Are they more flaky than the normal checks (which are also quite flaky)?

aee-google avatar Apr 27 '24 00:04 aee-google

Thanks for the detailed explanation. For the failing the tests, maybe better do some retries once or twice to see if it's just flaky. And you can compare the test result with trunk as well. If it continue fails, maybe take a look if it fails for the same reason. Now, it seems like many Android tests fail.

sherryzy avatar Apr 29 '24 17:04 sherryzy