greenlight
greenlight copied to clipboard
GIT-2541: Fixed inconsistency between room last_session and meeting createTime on create (Closes #2541)
Fixed inconsistency between room last_session in GL and meeting createTime on BBB when a problem occurs during the start of an old room
Description
BigBlueButton (BBB) recommends the providing of the meeting creation timestamp in the createTime
property and checks its legitimacy invalidating the join requests in case of a mismatch.
Greenlight (GL) fallows the BBB recommendations and uses the room last_session
attribute to store the timestamp of meeting creation and provides it upon join requests as a value to the createTime
property.
Whenever, the creation process fails GL won't update the attributes of the room in question including its last_session
and sessions
properties.
Some cases just like the GL timeout causes the creation process to fail on GL but to continue on BBB creating this inconsistency. Resulting in:
- The creation of the session on the BBB server.
- The inconsistency between GL and the BBB server.
The room session creator will observer an external error BigBlueButton invalid secret or...
which will lead them to retry creating the session which will result upon successful create requests in:
- GL attempting to create a new session because it doesn't know that the room has one running already (the
is_meeting_running
BBB API call marks a session as "running" only if it had at least one attendee who joined it). - BBB indicates that a session exists for the corresponding
meetingID
by aDuplicateWarning
message. - GL receives the duplication alert and proceeds in joining the requester without updating the room attributes.
- The join fails because of the invalid
createTime
value given by GL that corresponds to the last successful session and not the current one. - The end user will be redirected to the join URL in which will observe an error from BBB (This error depends on the BBB and its reverse proxy configuration, some will observe the error
invalid The createTime parameter submitted mismatches...
as an alert banner in BBB's demo page while others will simply observe it in the URL as a query parameter while others will be redirected to a fallback site...). This will result in preventing anyone from "starting" the room according to GL definition and therefore preventing anyone from joining the session. This happens with rooms that hosted at least one session because at the moment for backward compatibility BBB still accepts the omitting of thecreateTime
which happens automatically with new rooms that has anil
last_session
which explains why a lot of users created new rooms to host the sessions after facing this issue. A user can only wait for the BBB to implicitly stop the session or an administrator has to manually stop it or empty/update the roomslast_session
value to resolve this.
Testing Steps
- Created several rooms.
- Started several sessions.
- Raised exceptions in several parts in the creation logic.
- Observed the results in GL and BBB server.
- Confirmed the root cause.
Solution
Since, after a successful create request GL will have the session data including its createTime
and also will have access to the room ActiveRecord
instance in question meaning that without making additional queries to the DB or requests to the BBB API a simple check for the consistency between the room's last_session
and the createTime
will result in updating the local value in case of any mismatch despite the cause (A duplicate request isn't an error).
In addition, in this PR the GL timeout can be configured externally through the environmental variable TIMEOUT
.
NOTE: Configuring the timeout on GL may not seem to solve the frequent timeout issue for some deployments that uses ScaleLite which has its own timeout that requires configuring.
This solution won't solve any inconsistency between GL and ScaleLite timeout but it ensures that even in such deployments and until ScaleLite gets updated any successful create request will resolve the issue.
Screenshots (if appropriate):
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
Instead of this whole change, why not just patch the gem to increase the timeout from 10 to 20 or 30? This seems like a ton of work to an easily solvable problem
Instead of this whole change, why not just patch the gem to increase the timeout from 10 to 20 or 30? This seems like a ton of work to an easily solvable problem
I believe that you are referring to the ruby-bbb-api
gem timeout altering that was added in this PR.
If so, I avoided patching the gem (if that what you meant) because it may brake compatibility for any updates (of the gem) since it deviates its codebase from its actual upstream.
I tried to keep the dependencies implementation out of my scope and focus on what could be done through GL codebase.
Do you think it's better to hardcode an increase in the gems timeout and do you mean by patching the gem is to simply doing what I've done but with a fixed timeout rather then injecting it via an environmental variable which leads to filtering, checking and bounding its value (that may led you to feel it overly complex)?
The ruby-bbb-api
gem offers accessor methods on the timeout to simply alter its value externally.
Having the timeout set by the environmental variable felt the best way to answer some of the users requests who made it clear that a fixed timeout isn't suitable for some deployments.
The extra logic was added to bound the timeout.
The end solution has nothing to do with the timeout injection but after consulting the issue it felt right to enable GL users to define their own timeout without having to build a new image each time and to keep simple and flexible for all users.
Do you want me to limit this PR to only solve the duplication issue only and see a better way to enable timeout altering as a separate concern and maybe add it as a feature just like Room Limit
(It's starting to feel as the right way to do it)?
Instead of this whole change, why not just patch the gem to increase the timeout from 10 to 20 or 30? This seems like a ton of work to an easily solvable problem
I believe that you are referring to the
ruby-bbb-api
gem timeout altering that was added in this PR. If so, I avoided patching the gem (if that what you meant) because it may brake compatibility for any updates (of the gem) since it deviates its codebase from its actual upstream. I tried to keep the dependencies implementation out of my scope and focus on what could be done through GL codebase.Do you think it's better to hardcode an increase in the gems timeout and do you mean by patching the gem is to simply doing what I've done but with a fixed timeout rather then injecting it via an environmental variable which leads to filtering, checking and bounding its value (that may led you to feel it overly complex)?
The
ruby-bbb-api
gem offers accessor methods on the timeout to simply alter its value externally. Having the timeout set by the environmental variable felt the best way to answer some of the users requests who made it clear that a fixed timeout isn't suitable for some deployments. The extra logic was added to bound the timeout. The end solution has nothing to do with the timeout injection but after consulting the issue it felt right to enable GL users to define their own timeout without having to build a new image each time and to keep simple and flexible for all users. Do you want me to limit this PR to only solve the duplication issue only and see a better way to enable timeout altering as a separate concern and maybe add it as a feature just likeRoom Limit
(It's starting to feel as the right way to do it)?
I was referring to this: https://github.com/mconf/bigbluebutton-api-ruby/blob/master/lib/bigbluebutton_api.rb#L75
Our the time I've worked on Greenlight, I've learned that it's sometimes best that we just set the limit ourselves. I think we're getting close to the point where we have too many this configurable which just causes more confusion for the administrators.
2 options here:
- Patch the default in the gem (we have a good relationship with mconf and can get them to release a new version if needed)
- Just hard code a better timeout number without making it configurable by env
Instead of this whole change, why not just patch the gem to increase the timeout from 10 to 20 or 30? This seems like a ton of work to an easily solvable problem
I believe that you are referring to the
ruby-bbb-api
gem timeout altering that was added in this PR. If so, I avoided patching the gem (if that what you meant) because it may brake compatibility for any updates (of the gem) since it deviates its codebase from its actual upstream. I tried to keep the dependencies implementation out of my scope and focus on what could be done through GL codebase. Do you think it's better to hardcode an increase in the gems timeout and do you mean by patching the gem is to simply doing what I've done but with a fixed timeout rather then injecting it via an environmental variable which leads to filtering, checking and bounding its value (that may led you to feel it overly complex)? Theruby-bbb-api
gem offers accessor methods on the timeout to simply alter its value externally. Having the timeout set by the environmental variable felt the best way to answer some of the users requests who made it clear that a fixed timeout isn't suitable for some deployments. The extra logic was added to bound the timeout. The end solution has nothing to do with the timeout injection but after consulting the issue it felt right to enable GL users to define their own timeout without having to build a new image each time and to keep simple and flexible for all users. Do you want me to limit this PR to only solve the duplication issue only and see a better way to enable timeout altering as a separate concern and maybe add it as a feature just likeRoom Limit
(It's starting to feel as the right way to do it)?I was referring to this: https://github.com/mconf/bigbluebutton-api-ruby/blob/master/lib/bigbluebutton_api.rb#L75
Our the time I've worked on Greenlight, I've learned that it's sometimes best that we just set the limit ourselves. I think we're getting close to the point where we have too many this configurable which just causes more confusion for the administrators.
2 options here:
- Patch the default in the gem (we have a good relationship with mconf and can get them to release a new version if needed)
- Just hard code a better timeout number without making it configurable by env
Alright, do I increase the timeout to
Instead of this whole change, why not just patch the gem to increase the timeout from 10 to 20 or 30? This seems like a ton of work to an easily solvable problem
I believe that you are referring to the
ruby-bbb-api
gem timeout altering that was added in this PR. If so, I avoided patching the gem (if that what you meant) because it may brake compatibility for any updates (of the gem) since it deviates its codebase from its actual upstream. I tried to keep the dependencies implementation out of my scope and focus on what could be done through GL codebase. Do you think it's better to hardcode an increase in the gems timeout and do you mean by patching the gem is to simply doing what I've done but with a fixed timeout rather then injecting it via an environmental variable which leads to filtering, checking and bounding its value (that may led you to feel it overly complex)? Theruby-bbb-api
gem offers accessor methods on the timeout to simply alter its value externally. Having the timeout set by the environmental variable felt the best way to answer some of the users requests who made it clear that a fixed timeout isn't suitable for some deployments. The extra logic was added to bound the timeout. The end solution has nothing to do with the timeout injection but after consulting the issue it felt right to enable GL users to define their own timeout without having to build a new image each time and to keep simple and flexible for all users. Do you want me to limit this PR to only solve the duplication issue only and see a better way to enable timeout altering as a separate concern and maybe add it as a feature just likeRoom Limit
(It's starting to feel as the right way to do it)?I was referring to this: https://github.com/mconf/bigbluebutton-api-ruby/blob/master/lib/bigbluebutton_api.rb#L75
Our the time I've worked on Greenlight, I've learned that it's sometimes best that we just set the limit ourselves. I think we're getting close to the point where we have too many this configurable which just causes more confusion for the administrators.
2 options here:
- Patch the default in the gem (we have a good relationship with mconf and can get them to release a new version if needed)
- Just hard code a better timeout number without making it configurable by env
You got reason, I wasn't aware of the long term implications of keeping adding configurable settings to the app. I'll increase it to 30seconds and point this out in the PR description for those who want to further change the timeout themselves.
@farhatahmad requested changes that were made on Jan 15:
- The
bbb-ruby-api
timeout is now statically set to 30sec and NOT INJECTED EXTERNALLY CHECK HERE. - An improvement to UX was added on meeting creation by RETRYING the
start_session
logic onjoin_path
to a maximum of 3 retries which eliminates the need for the user manual subsequent meeting start requests to fix the issue after observing it as an error on client side (caused by the timeout and/or race condition on meeting start [The operation was tested and it is thread safe but some clients will have a 500 error i.e. because of a rolled back transaction] ) and to enhance their overall experience CHECK HERE. - An update was made to the
bbb-ruby-api
gem to its latest version1.8.0
because of noticing that theBigBlueButton::Exception
inherits directly from theException
class and that this was fixed in latest version CHECK HERE and HERE.
BigBlueButton version gem must be replaced otherwise the Docker image won't build.
Yup, I'll reupdate the dependency and notify you.
@GhaziTriki the remote revision for bigbluebutton-ruby-api
has been updated to 5b803bd .
https://github.com/KH-Amir-TN/greenlight/runs/4881314049?check_suite_focus=true#step:9:564
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication