zulip
zulip copied to clipboard
billing: Add UserProfile.role, and track counts in RealmAuditLog.
Tim, would love your thoughts on this. This will be hard to change later, since it's going to be a part of what gets synced in the on-prem billing process.
Also added a commit with some proposed numbering for RealmAuditLog.event_type. Not totally sure how to think about numbering these either.
thoughts on this?
When you look at this, the things that would be most helpful:
- How should the RealmAuditLog migration be done? Should I add a new column, copy stuff over, and then rename the columns? Or do something more like what is in the commit?
- Finalizing the two sets of constants (UserProfile.role and RealmAuditLog.event_type), since those will be difficult to change later (especially UserProfile.role)
- How to handle multiple formats coming in through
remotes/server/analytics
. One option might be to add a new endpoint, another might be to allow that endpoint to accept both the old format (2 tables) and new format (3 tables).
I also just realized that is_realm_admin
was probably being used for bots as well? So potentially the right structure is to have normal bots have is_bot=True, role=MEMBER
, rather than having a role=BOT
value. (That's what the current draft implements.)
How to handle multiple formats coming in through remotes/server/analytics. One option might be to add a new endpoint, another might be to allow that endpoint to accept both the old format (2 tables) and new format (3 tables).
I think definitely supporting both is the right model. It's easy to add a check for whether the 3rd table is present and act accordingly.
How should the RealmAuditLog migration be done? Should I add a new column, copy stuff over, and then rename the columns? Or do something more like what is in the commit?
I switched this to the copy approach, since that actually works with the reverse migration, and ends up being quite clean.
Finalizing the two sets of constants (UserProfile.role and RealmAuditLog.event_type), since those will be difficult to change later (especially UserProfile.role)
I'm happy with RealmAuditLog.event_type, so I'll go ahead and merge that piece.
Merged the first commit and pushed back here.
I fixed the small bug noted above and expect to merge the UserProfile.role
commit shortly as well. I added some more spacing and included (in comments) values for some additional potential future rules. Also renamed the constants to UserProfile.ROLE_MEMBER
to match the namespacing style of our other similar constants.
OK, did some work on the next commit as well to add tests before pausing and pushing back here. It needs some debugging of test-backend TestRealmAuditLog
; it appears somewhere the keys in your extra_data
dictionary are turning from integers into strings, which is potentially quite problematic for naturally written code here. My guess is the problem is that dictionary keys are always strings in JSON; this might argue for us changing all the keys to e.g. '10'
, not 10
, to avoid the potential for this sort of mismatch.
OK, merged that refactor and releasing lock on this PR. The last couple commits may need some updates for UserProfile.MEMBER
-> UserProfile.ROLE_MEMBER
type changes; and more importantly we need to resolve the stringification issue I described above.
Thanks! It does look like json keys are always strings (and ujson.dumps automatically does the conversion from int->str). I agree RealmAuditLog.ROLE_COUNT and similar can just be '10' if needed.
How about things like UserProfile.role, though? Should we do a migration of UserProfile.role from int to str? An alternative would be to keep all keys as int in the codebase and have a function like get_role_counts_from_extra_data
that does the str->int conversion.
I think we definitely want UserProfile.role
to be an integer, since we're primarily storing those in a database column and it's rationally an integer; JSON uses of it are secondary to the main API usage (and generally they'll appear as values, not keys).
Hmm. I think keeping things as integers and using a get_role_counts(realm_audit_log_entry)
function is the best solution; we'll likely want to access these through a method like that anyway for code cleanliness to avoid duplicating the the JSON parsing in N places (etc.).
First three commits are ready for review, and include the main RealmAuditLog sync.
About to merge the first 3 commits, after introducing the AbstractRealmAuditLog
base class to remove the duplication in the RemoteRealmAuditLog design.
OK, merged the first 3 commits. Let me know when the next few are ready for review (or if you want me to deploy anything for testing on staging/chat.zulip.org)
cool! Yeah, I think the current set of commits might be good to test/harden on czo before 2.1. The next set of commits will be zulipchat.com specific, so don't need to wait on that.
I think the last edit relevant for on-prem deployments is updates to register-server.py
:
- It should run
send_analytics_to_remote_server
(I guess there's a product question here of whether turning on push notifications should now implysettings.SUMBIT_USAGE_STATISTICS
) - It should generate a zulipchat.com billing link (and running register_server a second time should maybe display that billing link?)
Merged the first commit and pushed back here to deal with the merge conflict.
This is ready for review.
Two comments about the first commit:
- The migration technically only needs to run on zulipchat.com realms, so possibly doesn't need to be checked in to the codebase.
- I don't yet understand why I needed to edit the test in test_import_export, and want to check that it is not hiding a bug.
I think the work remaining before on-prem billing is usable is the following (can be copied into a new issue -- can do that once this is merged):
-
Verify that on-prem organizations sync their tables once an hour, not once a day (I think this is already the case: puppet/zulip/files/cron.d/update-analytics-counts)
-
Add an hourly cron job 10 min past the hour that calls update_license_ledger_if_needed on every RemoteRealmAuditLog entry that hasn't been processed.
(One method could be to add a table like FillState that keeps track of how far we've gotten. In this case the table would have only 1 row). (Also should make sure the right thing happens if an entry in RemoteRealmAuditLog is processed twice, or make sure that that won't happen. Potentially can be accomplished by putting the LicenseLedger row creation in a transaction.atomic with the FillState update. Another option might be to not use a FillState table, but instead add a LicenseLedger.log_entry field that points back to the relevant AbstractRealmAuditLog row.) (Also after this should add an end-to-end test that: calls send_analytics_to_remote_server, upgrades, adds some users, calls send_analytics_to_remote_server again, runs the cron job, checks that the appropriate LicenseLedger entry was created, runs invoice_plans_as_needed, checks that the invoice was created in stripe.)
- Correctly handle mutiple realms per server. Currently we do something broken -- if you have 2 realms, we'll interpret updates from both realms as if they were updates pertaining to a single realm.
The short term fix is to add a function like
get_total_seat_count_for_server(server, event_time)
that does something like the following:
- RemoteRealmAuditLog.objects.filter(server=server).values('realm_id').distinct() to get the realms
- for each realm look at entries with REALM_DEACTIVATED and/or REALM_REACTIVATED to determine if it was active at event_time
- add up the get_seat_count values of each active realm (RemoteRealmAuditLog.objects.filter(server=server, realm_id=realm_id, event_time__lte=event_time, event_type__in=SYNCED_BILLING_EVENTS).last() for the relevant log_entry for each realm)
get_total_seat_count_for_server will need to be called from update_license_ledger_for_automanaged_plan and get_latest_seat_count (and possibly others; haven't done an audit).
Longer term we'll want to do something that looks more like an enterprise grid, but that is a decently sized project of its own.
-
./manage.py register_server should call
do_record_role_counts
on each realm and then/manage.py update_analytics_counts
, since the billing link doesn't work if there is no RemoteRealmAuditLog data. -
If you import a realm, we should potentially call do_record_role_counts at the end? Not sure if the import code always fully constructs the RealmAuditLog data?
-
Better error message if you go to the billing link to upgrade and there is no RemoteRealmAuditLog data (i.e. change the assertion error in BillingOrg.get_latest_seat_count to a BillingError, wrap the
seat_count = billing_user.get_latest_seat_count()
in corporate.views.initial_upgrade in a try/except block, and add a test for this case.) -
Add a signup flow for the initial upgrade, both for new users and existing users. One idea is that
./manage.py register_server
just handles this. E.g. it prints out the billing link and says "if you ever need to get it back, just run register_server again". Then we email all old users saying they have to run register_server again. Another idea is that./manage.py register_server
spits out the billing link, but then we also add another command./manage.py billing
that does so as well. -
I think we decided the actual pricing would be something like "<= 3 users free, > 3 users is $X/user/mo". That needs to be implemented.
-
Currently the on-prem upgrade just upgrades you to CustomerPlan.STANDARD. We should do something like the following in corporate/models.py (I think ENTERPRISE is currently not in use, but we should do a quick check on prod)
- ENTERPRISE = 10
+ PUSH_ONLY = 11
+ ENTERPRISE = 12 # not available through self-serve signup
and then pipe everything through so that on-prem upgrades use PUSH_ONLY instead of STANDARD.
-
Manual testing of the on-prem billing flow.
-
Maybe make it harder for on-prem users to accidentally change their uuid?
By default I'm thinking it would be good for someone to take this over -- I think any of the pieces remaining require deep changes to the billing system. I can do the first pass review on stuff, and help answer any design questions that come up.
Follow-ups:
- Implement the "free plan" state for on-prem, where push notifications don't work.
- Email existing on-prem users to give them a heads up of billing changes to come.
- Add ability for server admins to rotate their billing key
- Should bot_dict_fields in zerver/lib/cache.py have
role
as one of the fields? - Update the frontend to send and accept a role field, instead of is_guest and is_admin
- In actions.py: separate out the api_super_user case from do_change_is_admin, and then merge do_change_is_admin and do_change_is_guest into do_change_role.
- Realm deactivation and reactivation should do the right thing with respect to billing (for both on-prem and zulipchat.com users)
- Nice to have would be to add a LicenseLedger.log_entry field that points back to whichever RealmAuditLog / RemoteRealmAuditLog entry was responsible for the LicenseLedger row (since now I think we have the invariant that every LicenseLedger row has a corresponding AbstractRealmAuditLog row).
Note that I decided to keep constants as RealmAuditLog.XXX instead of AbstractRealmAuditLog.XXX, since the latter is getting to be long? But one could argue that should be changed.
The errors are just migration numbering conflicts. Let me know if it would be helpful to fix them.
Heads up @rishig, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master
branch and resolve your pull request's merge conflicts accordingly.