courtlistener
courtlistener copied to clipboard
Upgrade replication customers to postgres 16 (Deadline Feb. 29)
Seven of our replication customers are on posgres 11, which AWS plans to deprecate. I have communicated with all of them and they are all on board with a switch to postgres 16 at the proxy level.
Here's the email I sent:
Hello, I'm writing because your organization subscribes to Free Law Project's replicated database service, and the database version that your organization is using will soon no longer be supported at our host, AWS.
We need to make a decision about what to do:
- Recommended: Swap in a new replica.
Unfortunately, it is not possible to upgrade a Postgresql replication subscriber in place. Therefore, the best thing Free Law Project can do is create a fresh replica for you using the latest version of Postgresql and to then swap this new database with your current one.
This will be seamless for you and you will immediately be upgraded.
- Possible to buy time: Do nothing, keep your old DB, and pay more forever.
If you have made customizations to your replica, option 2 is to use and pay for AWS's long term support service at an additional monthly cost of approximately $300 for years 1 and 2, and $600 for year three. This will allow you to avoid upgrading for up to three years and means there is no need to take any action until then.
Please let me know what would work best for your organization.
I highly recommend most organizations choose option 1. The latest version of Postgresql includes numerous performance and feature upgrades. Furthermore, frankly, if we don't have anybody choosing option 2, our maintenance will be simpler and more reliable for all. If you are considering option 2, let's discuss — there is likely time to achieve option 1 successfully.
Timeline Your version of Postgresql will be deprecated by AWS on February 29th. If we do not have a decision by that time, you will begin paying the long term support fees.
I hope to have a plan of action for this from all customers within one week from today, and aim to begin upgrading Option 1 customers shortly after that.
Thank you, as always, for being our supporters, customers, and advocates. We're fortunate to be able to work with you all. I look forward to any questions or concerns you may have.
The simple way to do this will be to:
- Create seven new replicas in postgresql 16
- Swap to the new replicas at the proxy level
- Terminate the old replicas.
But, there's a clever way too that fixes https://github.com/freelawproject/courtlistener/pull/3546 and then https://github.com/freelawproject/courtlistener/pull/3558 along the way:
- Create the new RDS instances.
- Set up their schemas
- Do the two PRs across all replicas
- Subscribe the new replicas
- Once sync'ed, swap to the new replicas at the proxy level
- Wait a week, say
- Terminate the old replicas
The advantage of this approach is that step 3 won't have to delete a huge index nor create a new huge index. Instead, it will simply do that work on an empty database. Bonus.
Current plan:
- Upgrade our two v16 customers by migrating their systems.
- Migrate prod and cl-replica
- Make new instances in v16 for our seven v11 customers.
- Swap those in.
Prod, cl-replica, and our two pg16 servers have the index fixes.
Next step is to make the new instances for our clients.
Hey @mlissner
Roughly this are the steps I was able to come up with. Let me know your thoughts.
-
Launch the new instance for
ferret
. Will try the Pulumi workflow to create it, it will probably won't work on the first run. New instance will have the identifierferret-16
while syncing. Password should be automatically set and stored in Secrets Manager. -
Create the schema: I am assuming we are taking the schema from
ferret
directly. If not we could take the tables from the publication_tables like this maybe:
TABLES=$( psql --host $DB_REPLICA_HOST -p 5432 -d courtlistener -U django --tuples-only -c \
"SELECT CONCAT(schemaname, '.', tablename) FROM pg_publication_tables WHERE pubname='opendata_ferret';" \
| xargs -I{} echo -n " -t {}"
)
Then pull the schema using:
pg_dump --host $X_HOST \
--username django \
--create \
--schema-only \
--no-privileges \
--no-publications \
--no-subscriptions courtlistener \
$TABLES \
--password | grep -v -i 'trigger'
Finally import it into the new ferret replica:
psql --host ferret-16.xxxxxx.us-west-2.rds.amazonaws.com \
--port=5432 --user django --dbname postgres < ferret_schema.2024-04-05.sql
- Create the publication on replica.
In this case, based on the publications I found on the replica, the new publication could be called ferret_publication
since the current one is opendata_ferret
.
Will create it using the following stating the same tables as the current publication has:
CREATE PUBLICATION ferret_publication FOR TABLE ONLY ...
-
Role.
Could it be possible to re-use the current role that ferret uses ?
If I not, I am not sure what would a proper new name would be.
Also not sure where those passwords are stored.
-
Create subscription on ferrer-16.
CREATE SUBSCRIPTION ferret_subscription CONNECTION 'host=cl-replica.xxx.amazonaws.com port=5432 user=ferret password=XXX dbname=courtlistener' PUBLICATION ferret_publication;
This will get the replication started.
When it's synced up and we are ready to do the switch, we can configure haproxy to use both backends to minimize new connections from failing during dns update (ferret.xxx and ferret-16.xxx), update the database identifiers and then remove the ferret-16 backend.
Or we can just go ahead and do the switch by simply modifying the identifiers, open connections should remain connected.
I will be documenting this process here: Upgrading/Migrating a replica
Yeah, that sounds basically right. I assume this is all based on our usual replication set up?
A few bits of feedback:
I am assuming we are taking the schema from ferret directly.
I'd actually use an export of the schema from prod, as described on the wiki. It's the ground truth.
Could it be possible to re-use the current role that ferret uses ?
Yes. I have these passwords in my password manager. I imagine there's a better way.
we can just go ahead and do the switch by simply modifying the identifiers, open connections should remain connected.
This seems easier and fine to me. I'd just suggest we do it during a maintenance window.
Sounds good!
I'll go ahead and start the new instance.
Will basically use the steps on the Setting up Replication for a new Customer
page.
I will need the ferret role's password, we can use the secret manager. We can a secret called cl_replicas
and store the passwords there.
Each secret costs 0.40/month.
Alternative we can search for a password/secrets sharing tool. Like Dashlane, LastPass or similar. Not a must and we will be using the secret manager either way.
I checked the permissions of all users and none have any secretmanager: permission set.
Great. Let me know when you need that password and I'll send it along.
Just created the instance and imported the schema from the prod db.
I am about to create the publication ferret_publication
, just want to confirm that these are the correct tables to add to the publication:
CREATE PUBLICATION ferret_publication FOR TABLE ONLY
--Audio
audio_audio, audio_audio_panel,
--Disclosures
disclosures_agreement, disclosures_debt, disclosures_financialdisclosure, disclosures_gift, disclosures_investment, disclosures_noninvestmentincome, disclosures_position, disclosures_reimbursement, disclosures_spouseincome,
--People
people_db_abarating, people_db_attorney, people_db_attorneyorganization, people_db_attorneyorganizationassociation, people_db_criminalcomplaint, people_db_criminalcount, people_db_education, people_db_party, people_db_partytype, people_db_person, people_db_person_race, people_db_politicalaffiliation, people_db_position, people_db_race, people_db_role, people_db_school, people_db_source,
--FJC
recap_fjcintegrateddatabase,
--Search
search_bankruptcyinformation, search_citation, search_claim, search_claim_tags, search_claimhistory, search_court, search_court_appeals_to, search_courthouse, search_docket, search_docket_panel, search_docket_tags, search_docketentry, search_docketentry_tags, search_opinion, search_opinion_joined_by, search_opinioncluster, search_opinioncluster_non_participating_judges, search_opinioncluster_panel, search_opinionscited, search_opinionscitedbyrecapdocument, search_originatingcourtinformation, search_parenthetical, search_parentheticalgroup, search_recapdocument, search_recapdocument_tags, search_tag
;
And right after that I'll create the subscription on the new instance and I will need the ferret role's password.
ty!
That looks about right. Here's what I have from my notes (the wiki can be a bit outdated):
CREATE PUBLICATION cinnamon_publication FOR TABLE ONLY
--Audio
audio_audio, audio_audio_panel,
--Disclosures
disclosures_agreement, disclosures_debt, disclosures_financialdisclosure, disclosures_gift, disclosures_investment, disclosures_noninvestmentincome, disclosures_position, disclosures_reimbursement, disclosures_spouseincome,
--People
people_db_abarating, people_db_attorney, people_db_attorneyorganization, people_db_attorneyorganizationassociation, people_db_criminalcomplaint, people_db_criminalcount, people_db_education, people_db_party, people_db_partytype, people_db_person, people_db_person_race, people_db_politicalaffiliation, people_db_position, people_db_race, people_db_role, people_db_school, people_db_source,
--FJC
recap_fjcintegrateddatabase,
--Search
search_bankruptcyinformation, search_citation, search_claim, search_claim_tags, search_claimhistory, search_court, search_court_appeals_to, search_courthouse, search_docket, search_docket_panel, search_docket_tags, search_docketentry, search_docketentry_tags, search_opinion, search_opinion_joined_by, search_opinioncluster, search_opinioncluster_non_participating_judges, search_opinioncluster_panel, search_opinionscited, search_opinionscitedbyrecapdocument, search_originatingcourtinformation, search_parenthetical, search_parentheticalgroup, search_recapdocument, search_recapdocument_tags, search_tag
;
Did you see that we also need to make alarms in CloudWatch for new replicas?
Just created the alarm:
And added it to the composite:
What's pending for now is setting up the publication and the subscription using the ferret role I think.
Password sent via Whatsapp.
Just kicked the replication off had to add the public ip of the new instance to the rds-launch-wizard
sg.
I haven't seen any errors on ferret-16's logs so far, the new slots are already there: lag.
Do you use any tool to check the integrity of the replicas ? With databases this big probably not but I might just ask.
I usually check the logs on both sides, monitor the replication lag, and check the network throughput. If you see networking going way up, you know the data is flowing. If there are no errors in the logs, that's good. If the replication lag has a few new slots that makes sense!
Got it! Well will keep an eye on the logs and metrics. So far so good.
Saw some errors in the log from the new replica:
2024-04-07 18:37:53 UTC::@:[8830]:ERROR: could not start initial contents copy for table "public.search_opinionscitedbyrecapdocument": ERROR: permission denied for table search_opinionscitedbyrecapdocument
Granted SELECT
access to those tables to ferret
role:
public.search_court_appeals_to
public.search_opinionscitedbyrecapdocument
public.search_parenthetical
public.search_parentheticalgroup
Perfect.
Hey @mlissner . I am starting the new replica for wsj2
, will call it wsj2-16
.
Not sure if you are OK using this suffix for the new replicas. Again I will use the same template as tumeric
.
Will use these values:
Publication name: wsj_publication
Will use role wsj
(Will need to ask you for the password and will make sure the role has access to all the tables in the subscription)
Let me know if this makes sense to you. Thank you!
That all sounds right to me. I'd go with wsj3
for the DB name though. We know the postgresql version. What we don't know is which iteration of the DB we're working with when we glance at it.
Thank you @mlissner . Just started the replication for wsj3. I will keep an eye on the replication and let you know. I created the alarm and added it to the composite.
Hello @mlissner just wanted to let you know that wsj3
seems to be all synced up.
I also needed to ask you what would you like to do with the old ferret
replica ? We made the swap on Monday.
Thank you!
Great about wsj3. You can swap that at the proxy and once you've done that I'll let them know it's done and ask if we can tear down wsj2.
For ferret, I just emailed a confirmation with the client.
Just made the swap. I made sure I was able to log using the same django credentails into the new replica, applied the changes and reloaded HAproxy.
Now wasabi.courtlistener.com
connects to wsj3
.
Will keep an eye on the sessions and monitoring.
Just an update:
- Created
keystone2
RDS instance. - Created storage alarm and added it to the composite one.
- Replication is in place, no errors so far.
- Deleted
ferret
replica.
Hello @mlissner.
keylime.courtlistener.com
is now pointing to keystone2
replica. I was able to connect from the outside using the same django credentials. Will keep an eye on logs and monitoring.
Will now start working on wsj2
to shut it down. And will set up the replication for onelegal
.
Cool. Customer contacted for confirmation of deletion.
Update from today:
wsj2
was deleted. (Also the publication and it's slots, and also the entry for the composite alarlm and the storage alarm)
Just to be sure I doubled accessing wasabi.courtlistener.com
using the same credentials and also made sure that replication was working properly.
- We are still waiting for confirmation to delete
keystone
- onelegal2 is still catching up
- Pending instance is
compass
which I expect to start working on today(Wednesday)
I'll give keystone
another day or two and then bug the client again.
Hey @mlissner
onelegal
is all synced up now. We can do the swap whenever you think it is appropiate.
Just created the PR for the new compass instance. I can merge whenever we feel it's ok to fire that instance.
Thank you!!
You can go ahead with the swap for onelegal.