FluidFramework icon indicating copy to clipboard operation
FluidFramework copied to clipboard

Targeted Signal support for Alfred Lamda

Open WillieHabi opened this issue 1 year ago • 3 comments

Description

ADO Task 7146

Preliminary work for https://github.com/microsoft/FluidFramework/pull/19490

Update to alfred lambda to handle target client id's on signal messages. This is preliminary server work needed to allow local and routerlicious clients to send and receive targeted signals.

For context -- each client is subscribed to their own room/channel along with the document room. This is the roomId being used when targetClientId is specified.

Reviewer Guidance

Planning on publishing an internal prerelease package with these changes to eventually bump to. Is there any problems with this?

WillieHabi avatar Feb 07 '24 16:02 WillieHabi

Rework once base types are updated. Also expect to see feature support declared. And maybe an option to control it or is that done with versioning?

Did the rework and check for client feature support -- used ODSP PUSH Channel as a reference. Also bumped to newest protocol def.

WillieHabi avatar Feb 14 '24 19:02 WillieHabi

Also are there any tests that can happen here?

jason-ha avatar Feb 14 '24 22:02 jason-ha

Also are there any tests that can happen here?

Will look into this

Edit: I think the End-To-End client tests would be the most appropriate way to verify if the signals are being targeted properly.

WillieHabi avatar Feb 14 '24 23:02 WillieHabi

I had to solve a merge conflict here after merging #19851 @jason-ha

WillieHabi avatar Mar 27 '24 16:03 WillieHabi

@znewton @jason-ha I made some changes to accommodate conflicts with this PR being merged https://github.com/microsoft/FluidFramework/pull/19906

WillieHabi avatar Mar 29 '24 19:03 WillieHabi

@jason-ha I also have some client side type changes here https://github.com/microsoft/FluidFramework/pull/20299

WillieHabi avatar Mar 29 '24 19:03 WillieHabi

Also are there any tests that can happen here?

The above got a thumbs up, but I am not seeing any units tests. I know there are E2E tests with client changes pending, but we really want to have nearby unit tests which can help cover the logic better and hit corner cases.

jason-ha avatar Apr 02 '24 17:04 jason-ha

Also are there any tests that can happen here?

The above got a thumbs up, but I am not seeing any units tests. I know there are E2E tests with client changes pending, but we really want to have nearby unit tests which can help cover the logic better and hit corner cases.

It doesn't look like there are any signal unit tests on the server. I added a v2 signal usage test but this doesn't help cover any added logic or corner cases. Any multi-client signals logic testing currently lives in client e2e tests.

@znewton @alteut Do you guys have any insight on if/where/how signal functionality is tested on the server.

WillieHabi avatar Apr 02 '24 19:04 WillieHabi

Also are there any tests that can happen here?

The above got a thumbs up, but I am not seeing any units tests. I know there are E2E tests with client changes pending, but we really want to have nearby unit tests which can help cover the logic better and hit corner cases.

I doesn't look like unit tests are done for signals in the server release group. For testing changes across release groups (e.g., mutli-client signals) testing is done locally https://eng.ms/docs/experiences-devices/opg/office-shared/fluid-framework/fluid-framework-internal/fluid-framework/docs/dev/testing/cross-release-groups-yalc.

I can verify that the client e2e tests I wrote here all pass locally with these corresponding client side changes

WillieHabi avatar Apr 03 '24 18:04 WillieHabi

Also are there any tests that can happen here?

The above got a thumbs up, but I am not seeing any units tests. I know there are E2E tests with client changes pending, but we really want to have nearby unit tests which can help cover the logic better and hit corner cases.

I doesn't look like unit tests are done for signals in the server release group. For testing changes across release groups (e.g., mutli-client signals) testing is done locally https://eng.ms/docs/experiences-devices/opg/office-shared/fluid-framework/fluid-framework-internal/fluid-framework/docs/dev/testing/cross-release-groups-yalc.

I can verify that the client e2e tests I wrote here all pass locally with these corresponding client side changes

The tests you had and then removed were the right direction. Is that file not actually use? I could see signal unit testing being overlooked because it was so simple before. Not so simple that there shouldn't have been testing, but that is how things are.

We want unit tests for nearly everything. When you start work in an area that can mean backfilling missing coverage. We have that here. Do add tests for new logic and old. We want a combination of possible inputs and conditions.

  1. connection should advertise v2 or not
  2. message should be array, ISentSignalMessage, or other
  3. content within message should be string (how we use it for v1) or other things (because it isn't supposed to care)

Check combinations of the above and check that result is correct format and content in the result is as expected. All of this can and should happen without client changes and e2e tests.

cc: @sashasimic while I am away

jason-ha avatar Apr 07 '24 11:04 jason-ha

Also are there any tests that can happen here?

The above got a thumbs up, but I am not seeing any units tests. I know there are E2E tests with client changes pending, but we really want to have nearby unit tests which can help cover the logic better and hit corner cases.

I doesn't look like unit tests are done for signals in the server release group. For testing changes across release groups (e.g., mutli-client signals) testing is done locally https://eng.ms/docs/experiences-devices/opg/office-shared/fluid-framework/fluid-framework-internal/fluid-framework/docs/dev/testing/cross-release-groups-yalc. I can verify that the client e2e tests I wrote here all pass locally with these corresponding client side changes

The tests you had and then removed were the right direction. Is that file not actually use? I could see signal unit testing being overlooked because it was so simple before. Not so simple that there shouldn't have been testing, but that is how things are.

We want unit tests for nearly everything. When you start work in an area that can mean backfilling missing coverage. We have that here. Do add tests for new logic and old. We want a combination of possible inputs and conditions.

  1. connection should advertise v2 or not
  2. message should be array, ISentSignalMessage, or other
  3. content within message should be string (how we use it for v1) or other things (because it isn't supposed to care)

Check combinations of the above and check that result is correct format and content in the result is as expected. All of this can and should happen without client changes and e2e tests.

cc: @sashasimic while I am away

I added and merged preliminary signal tests on the server in this PR here https://github.com/microsoft/FluidFramework/pull/20516

WillieHabi avatar Apr 09 '24 22:04 WillieHabi

@WillieHabi couple of more things left for this PR before merging it:

  • I had one last minor comment.
  • Please get @znewton's approval.
  • You may need to reach out to Jason to remove change request so you can merge it.
  • Branch has conflicts that you need to resolve.

sashasimic avatar Apr 23 '24 22:04 sashasimic

Are we sure that when a client says they support v2 that the only signals we send through are ISentSignalMessage_s? The rest will just get dropped. This doesn't seem robust to me. I was thinking that client saying they support v2 means they can send ISentSignalMessage_s. cc: @sashasimic, @znewton

My understanding is connected clients can only submit one type of signal. A client saying they 'support v2" are them communicating with the server to say they will send ISentSignalMessage's.

WillieHabi avatar Apr 24 '24 17:04 WillieHabi

Are we sure that when a client says they support v2 that the only signals we send through are ISentSignalMessage_s? The rest will just get dropped. This doesn't seem robust to me. I was thinking that client saying they support v2 means they can send ISentSignalMessage_s. cc: @sashasimic, @znewton

My understanding is connected clients can only submit one type of signal. A client saying they 'support v2" are them communicating with the server to say they will send ISentSignalMessage's.

If we are going to be that strict, then we at least will require telemetry each time we reject a signal. And clear comments about the restrictions. (Up until this point things are very free; so, it is a big switch to impose requirement that all signals have a specific format.) Is ODSP written as strictly?

jason-ha avatar Apr 24 '24 22:04 jason-ha

Are we sure that when a client says they support v2 that the only signals we send through are ISentSignalMessage_s? The rest will just get dropped. This doesn't seem robust to me. I was thinking that client saying they support v2 means they can send ISentSignalMessage_s. cc: @sashasimic, @znewton

My understanding is connected clients can only submit one type of signal. A client saying they 'support v2" are them communicating with the server to say they will send ISentSignalMessage's.

If we are going to be that strict, then we at least will require telemetry each time we reject a signal. And clear comments about the restrictions. (Up until this point things are very free; so, it is a big switch to impose requirement that all signals have a specific format.) Is ODSP written as strictly?

ODSP are even more strict looking at it now. They check if the signal is of type ISentSignalMessage and if not they disconnect the client with a nack error

WillieHabi avatar Apr 24 '24 22:04 WillieHabi

I updated the logic so now it checks for the correct v2 signal type and if the type is invalid it responds with a nack message. Similar to what what ODSP do.

WillieHabi avatar Apr 24 '24 22:04 WillieHabi

I tried to scan just the latest test changes, but couldn't quite follow some of them. If you have not already, you should make sure that it is possible for each test case to fail. (This is an often-overlooked practice that really helps make sure we have legit tests. Just make a list of all of the tests and change product code to violate it. When test fails, check it off.)

I have now made these checks -- also added some additional asserts to verify signal/nack content.

WillieHabi avatar Apr 29 '24 16:04 WillieHabi

re-requesting review on the code changes so far, still a few open discussions to resolve around array type checking and supportedFeaturesMap

WillieHabi avatar May 02 '24 00:05 WillieHabi

One other important case that I realized is missing is that there is no case that sends more than one batch item at a time. Let's be sure to add that.

jason-ha avatar May 03 '24 20:05 jason-ha

And I think that we also don't have throttling coverage nor did the in place logic get changed. Client targeted signals should have different computation than broadcast signals. Unless the logic is just a placeholder it should be updated. I would be okay with comments in code and follow-up work items if there aren't simple changes. cc: @znewton

jason-ha avatar May 03 '24 21:05 jason-ha

And I think that we also don't have throttling coverage nor did the in place logic get changed. Client targeted signals should have different computation than broadcast signals. Unless the logic is just a placeholder it should be updated. I would be okay with comments in code and follow-up work items if there aren't simple changes. cc: @znewton

I just rewatched Zach's presentation last week and saw that signal usage throttling is disabled, but the code is in place. My understanding is that current throttling/signal counting is based on the incoming amount of signals, so I don't think the logic would need to be updated. There is a test unders 'signal count' called "Should store the signal count when throttler is invoked" at the bottom of the file.

WillieHabi avatar May 06 '24 15:05 WillieHabi

And I think that we also don't have throttling coverage nor did the in place logic get changed. Client targeted signals should have different computation than broadcast signals. Unless the logic is just a placeholder it should be updated. I would be okay with comments in code and follow-up work items if there aren't simple changes. cc: @znewton

I just rewatched Zach's presentation last week and saw that signal usage throttling is disabled, but the code is in place. My understanding is that current throttling/signal counting is based on the incoming amount of signals, so I don't think the logic would need to be updated. There is a test unders 'signal count' called "Should store the signal count when throttler is invoked" at the bottom of the file.

I expect that the throttling logic is based on the assumption that signals are broadcast. Is that the case? Even if the counting is only for incoming. The sending is the more expensive part of signals. So, there should be some reconciliation. Maybe this is a follow up task; maybe there are comments added. Ignoring because it isn't turned on sounds like asking for a problem.

jason-ha avatar May 06 '24 19:05 jason-ha

And I think that we also don't have throttling coverage nor did the in place logic get changed. Client targeted signals should have different computation than broadcast signals. Unless the logic is just a placeholder it should be updated. I would be okay with comments in code and follow-up work items if there aren't simple changes. cc: @znewton

I just rewatched Zach's presentation last week and saw that signal usage throttling is disabled, but the code is in place. My understanding is that current throttling/signal counting is based on the incoming amount of signals, so I don't think the logic would need to be updated. There is a test unders 'signal count' called "Should store the signal count when throttler is invoked" at the bottom of the file.

I expect that the throttling logic is based on the assumption that signals are broadcast. Is that the case? Even if the counting is only for incoming. The sending is the more expensive part of signals. So, there should be some reconciliation. Maybe this is a follow up task; maybe there are comments added. Ignoring because it isn't turned on sounds like asking for a problem.

From Zach's presentation -- this is future work that is currently already tracked for FRS signal support effort (see Signal Counting and Billing here: https://microsoft.sharepoint.com/:w:/t/hubserv/EffgOs-XOg5KiNKh8KQq62IBN1yGvyK1-lx2MxCFmFcGSQ?e=21mWD7)

Screenshot 2024-05-06 at 12 35 12 PM

WillieHabi avatar May 06 '24 19:05 WillieHabi

I expect that the throttling logic is based on the assumption that signals are broadcast. Is that the case? Even if the counting is only for incoming. The sending is the more expensive part of signals. So, there should be some reconciliation. Maybe this is a follow up task; maybe there are comments added. Ignoring because it isn't turned on sounds like asking for a problem.

From Zach's presentation -- this is future work that is currently already tracked for FRS signal support effort (see Signal Counting and Billing here: https://microsoft.sharepoint.com/:w:/t/hubserv/EffgOs-XOg5KiNKh8KQq62IBN1yGvyK1-lx2MxCFmFcGSQ?e=21mWD7)

Screenshot 2024-05-06 at 12 35 12 PM

which indicates that the wrong thing would be done in After because we don't care about total connected clients for targeted case. @znewton, is there a work item that can be amended or added to make sure this aspect is addressed?

Additionally, there is a test case in here called "Should store the signal count when throttler is invoked" that is using signals but has no signals v2 coverage. It should get v2 coverage.

jason-ha avatar May 06 '24 20:05 jason-ha