zulip icon indicating copy to clipboard operation
zulip copied to clipboard

billing: Add UserProfile.role, and track counts in RealmAuditLog.

Open rishig opened this issue 5 years ago • 18 comments

Tim, would love your thoughts on this. This will be hard to change later, since it's going to be a part of what gets synced in the on-prem billing process.

rishig avatar Sep 25 '19 21:09 rishig

Also added a commit with some proposed numbering for RealmAuditLog.event_type. Not totally sure how to think about numbering these either.

rishig avatar Sep 26 '19 01:09 rishig

thoughts on this?

rishig avatar Oct 02 '19 05:10 rishig

When you look at this, the things that would be most helpful:

  • How should the RealmAuditLog migration be done? Should I add a new column, copy stuff over, and then rename the columns? Or do something more like what is in the commit?
  • Finalizing the two sets of constants (UserProfile.role and RealmAuditLog.event_type), since those will be difficult to change later (especially UserProfile.role)
  • How to handle multiple formats coming in through remotes/server/analytics. One option might be to add a new endpoint, another might be to allow that endpoint to accept both the old format (2 tables) and new format (3 tables).

I also just realized that is_realm_admin was probably being used for bots as well? So potentially the right structure is to have normal bots have is_bot=True, role=MEMBER, rather than having a role=BOT value. (That's what the current draft implements.)

rishig avatar Oct 03 '19 05:10 rishig

How to handle multiple formats coming in through remotes/server/analytics. One option might be to add a new endpoint, another might be to allow that endpoint to accept both the old format (2 tables) and new format (3 tables).

I think definitely supporting both is the right model. It's easy to add a check for whether the 3rd table is present and act accordingly.

How should the RealmAuditLog migration be done? Should I add a new column, copy stuff over, and then rename the columns? Or do something more like what is in the commit?

I switched this to the copy approach, since that actually works with the reverse migration, and ends up being quite clean.

Finalizing the two sets of constants (UserProfile.role and RealmAuditLog.event_type), since those will be difficult to change later (especially UserProfile.role)

I'm happy with RealmAuditLog.event_type, so I'll go ahead and merge that piece.

timabbott avatar Oct 06 '19 22:10 timabbott

Merged the first commit and pushed back here.

timabbott avatar Oct 06 '19 23:10 timabbott

I fixed the small bug noted above and expect to merge the UserProfile.role commit shortly as well. I added some more spacing and included (in comments) values for some additional potential future rules. Also renamed the constants to UserProfile.ROLE_MEMBER to match the namespacing style of our other similar constants.

timabbott avatar Oct 06 '19 23:10 timabbott

OK, did some work on the next commit as well to add tests before pausing and pushing back here. It needs some debugging of test-backend TestRealmAuditLog; it appears somewhere the keys in your extra_data dictionary are turning from integers into strings, which is potentially quite problematic for naturally written code here. My guess is the problem is that dictionary keys are always strings in JSON; this might argue for us changing all the keys to e.g. '10', not 10, to avoid the potential for this sort of mismatch.

timabbott avatar Oct 06 '19 23:10 timabbott

OK, merged that refactor and releasing lock on this PR. The last couple commits may need some updates for UserProfile.MEMBER -> UserProfile.ROLE_MEMBER type changes; and more importantly we need to resolve the stringification issue I described above.

timabbott avatar Oct 06 '19 23:10 timabbott

Thanks! It does look like json keys are always strings (and ujson.dumps automatically does the conversion from int->str). I agree RealmAuditLog.ROLE_COUNT and similar can just be '10' if needed.

How about things like UserProfile.role, though? Should we do a migration of UserProfile.role from int to str? An alternative would be to keep all keys as int in the codebase and have a function like get_role_counts_from_extra_data that does the str->int conversion.

rishig avatar Oct 07 '19 05:10 rishig

I think we definitely want UserProfile.role to be an integer, since we're primarily storing those in a database column and it's rationally an integer; JSON uses of it are secondary to the main API usage (and generally they'll appear as values, not keys).

Hmm. I think keeping things as integers and using a get_role_counts(realm_audit_log_entry) function is the best solution; we'll likely want to access these through a method like that anyway for code cleanliness to avoid duplicating the the JSON parsing in N places (etc.).

timabbott avatar Oct 07 '19 19:10 timabbott

First three commits are ready for review, and include the main RealmAuditLog sync.

rishig avatar Oct 08 '19 23:10 rishig

About to merge the first 3 commits, after introducing the AbstractRealmAuditLog base class to remove the duplication in the RemoteRealmAuditLog design.

timabbott avatar Oct 09 '19 00:10 timabbott

OK, merged the first 3 commits. Let me know when the next few are ready for review (or if you want me to deploy anything for testing on staging/chat.zulip.org)

timabbott avatar Oct 09 '19 00:10 timabbott

cool! Yeah, I think the current set of commits might be good to test/harden on czo before 2.1. The next set of commits will be zulipchat.com specific, so don't need to wait on that.

I think the last edit relevant for on-prem deployments is updates to register-server.py:

  • It should run send_analytics_to_remote_server (I guess there's a product question here of whether turning on push notifications should now imply settings.SUMBIT_USAGE_STATISTICS)
  • It should generate a zulipchat.com billing link (and running register_server a second time should maybe display that billing link?)

rishig avatar Oct 09 '19 07:10 rishig

Merged the first commit and pushed back here to deal with the merge conflict.

timabbott avatar Oct 21 '19 23:10 timabbott

This is ready for review.

Two comments about the first commit:

  • The migration technically only needs to run on zulipchat.com realms, so possibly doesn't need to be checked in to the codebase.
  • I don't yet understand why I needed to edit the test in test_import_export, and want to check that it is not hiding a bug.

I think the work remaining before on-prem billing is usable is the following (can be copied into a new issue -- can do that once this is merged):

  • Verify that on-prem organizations sync their tables once an hour, not once a day (I think this is already the case: puppet/zulip/files/cron.d/update-analytics-counts)

  • Add an hourly cron job 10 min past the hour that calls update_license_ledger_if_needed on every RemoteRealmAuditLog entry that hasn't been processed.

(One method could be to add a table like FillState that keeps track of how far we've gotten. In this case the table would have only 1 row). (Also should make sure the right thing happens if an entry in RemoteRealmAuditLog is processed twice, or make sure that that won't happen. Potentially can be accomplished by putting the LicenseLedger row creation in a transaction.atomic with the FillState update. Another option might be to not use a FillState table, but instead add a LicenseLedger.log_entry field that points back to the relevant AbstractRealmAuditLog row.) (Also after this should add an end-to-end test that: calls send_analytics_to_remote_server, upgrades, adds some users, calls send_analytics_to_remote_server again, runs the cron job, checks that the appropriate LicenseLedger entry was created, runs invoice_plans_as_needed, checks that the invoice was created in stripe.)

  • Correctly handle mutiple realms per server. Currently we do something broken -- if you have 2 realms, we'll interpret updates from both realms as if they were updates pertaining to a single realm. The short term fix is to add a function like get_total_seat_count_for_server(server, event_time) that does something like the following:
  • RemoteRealmAuditLog.objects.filter(server=server).values('realm_id').distinct() to get the realms
  • for each realm look at entries with REALM_DEACTIVATED and/or REALM_REACTIVATED to determine if it was active at event_time
  • add up the get_seat_count values of each active realm (RemoteRealmAuditLog.objects.filter(server=server, realm_id=realm_id, event_time__lte=event_time, event_type__in=SYNCED_BILLING_EVENTS).last() for the relevant log_entry for each realm)

get_total_seat_count_for_server will need to be called from update_license_ledger_for_automanaged_plan and get_latest_seat_count (and possibly others; haven't done an audit).

Longer term we'll want to do something that looks more like an enterprise grid, but that is a decently sized project of its own.

  • ./manage.py register_server should call do_record_role_counts on each realm and then /manage.py update_analytics_counts, since the billing link doesn't work if there is no RemoteRealmAuditLog data.

  • If you import a realm, we should potentially call do_record_role_counts at the end? Not sure if the import code always fully constructs the RealmAuditLog data?

  • Better error message if you go to the billing link to upgrade and there is no RemoteRealmAuditLog data (i.e. change the assertion error in BillingOrg.get_latest_seat_count to a BillingError, wrap the seat_count = billing_user.get_latest_seat_count() in corporate.views.initial_upgrade in a try/except block, and add a test for this case.)

  • Add a signup flow for the initial upgrade, both for new users and existing users. One idea is that ./manage.py register_server just handles this. E.g. it prints out the billing link and says "if you ever need to get it back, just run register_server again". Then we email all old users saying they have to run register_server again. Another idea is that ./manage.py register_server spits out the billing link, but then we also add another command ./manage.py billing that does so as well.

  • I think we decided the actual pricing would be something like "<= 3 users free, > 3 users is $X/user/mo". That needs to be implemented.

  • Currently the on-prem upgrade just upgrades you to CustomerPlan.STANDARD. We should do something like the following in corporate/models.py (I think ENTERPRISE is currently not in use, but we should do a quick check on prod)

-    ENTERPRISE = 10
+    PUSH_ONLY = 11
+    ENTERPRISE = 12  # not available through self-serve signup

and then pipe everything through so that on-prem upgrades use PUSH_ONLY instead of STANDARD.

  • Manual testing of the on-prem billing flow.

  • Maybe make it harder for on-prem users to accidentally change their uuid?

By default I'm thinking it would be good for someone to take this over -- I think any of the pieces remaining require deep changes to the billing system. I can do the first pass review on stuff, and help answer any design questions that come up.

Follow-ups:

  • Implement the "free plan" state for on-prem, where push notifications don't work.
  • Email existing on-prem users to give them a heads up of billing changes to come.
  • Add ability for server admins to rotate their billing key
  • Should bot_dict_fields in zerver/lib/cache.py have role as one of the fields?
  • Update the frontend to send and accept a role field, instead of is_guest and is_admin
  • In actions.py: separate out the api_super_user case from do_change_is_admin, and then merge do_change_is_admin and do_change_is_guest into do_change_role.
  • Realm deactivation and reactivation should do the right thing with respect to billing (for both on-prem and zulipchat.com users)
  • Nice to have would be to add a LicenseLedger.log_entry field that points back to whichever RealmAuditLog / RemoteRealmAuditLog entry was responsible for the LicenseLedger row (since now I think we have the invariant that every LicenseLedger row has a corresponding AbstractRealmAuditLog row).

Note that I decided to keep constants as RealmAuditLog.XXX instead of AbstractRealmAuditLog.XXX, since the latter is getting to be long? But one could argue that should be changed.

rishig avatar Oct 29 '19 18:10 rishig

The errors are just migration numbering conflicts. Let me know if it would be helpful to fix them.

rishig avatar Nov 09 '19 04:11 rishig

Heads up @rishig, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

zulipbot avatar Nov 18 '19 20:11 zulipbot