clean-and-green-philly icon indicating copy to clipboard operation
clean-and-green-philly copied to clipboard

Issue 520

Open zigouras opened this issue 1 year ago • 12 comments

Implementation of issue #520. Backup of database, archiving and diff reporting.

zigouras avatar May 15 '24 00:05 zigouras

Someone is attempting to deploy a commit to the Clean and Green Philly Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar May 15 '24 00:05 vercel[bot]

Moving this conversation about primary keys here.

@nlebovits- Is it correct to say that OPA_ID is unique per parcel, but for multifamily buildings there can be one or many vacant units inside one parcel?

@zigouras- can you document how the tests work and how to run them?

brandonfcohen1 avatar May 15 '24 13:05 brandonfcohen1

@brandonfcohen1 the tests are commented, indicating what each one does. Some of them are POCs, just running code to see the output. Some have assertions. I would not run them as part of CI/CD but they are a good place for people to look to understand the underlying code.

zigouras avatar May 15 '24 13:05 zigouras

What does the vacant_properties_end table hold, is it essentially a union of the data in all of the other tables? If so seems like it could be the most important table to include in the diff so it would be good to work out the pks on that one.

zigouras avatar May 15 '24 13:05 zigouras

What does the vacant_properties_end table hold, is it essentially a union of the data in all of the other tables? If so seems like it could be the most important table to include in the diff so it would be good to work out the pks on that one.

vacant_properties_end is the end result of the script, it's really the only table we need to report a diff on

brandonfcohen1 avatar May 16 '24 17:05 brandonfcohen1

@brandonfcohen1 you are probably looking at an earlier commit in this PR. Look at the latest commit. The backend setup doc describes the new backup and diff process.

zigouras avatar May 16 '24 18:05 zigouras

What does the vacant_properties_end table hold, is it essentially a union of the data in all of the other tables? If so seems like it could be the most important table to include in the diff so it would be good to work out the pks on that one.

vacant_properties_end is the end result of the script, it's really the only table we need to report a diff on

OK then we have to figure out how to enforce a primary key on it.

zigouras avatar May 16 '24 18:05 zigouras

got this running the new script for the first time:

This happens the first time you run it because there are null opa_ids in the vacant_properties table. To rerun, drop the backup_ schema, delete from vacant_properties where opa_id is null, and run again. You should also run delete from property_tax_delinquencies where opa_number = 0 and delete from li_violations where opa_account_num is null for the same reason.

zigouras avatar May 18 '24 19:05 zigouras

can you include instructions on how to run

got this running the new script for the first time:

This happens the first time you run it because there are null opa_ids in the vacant_properties table. To rerun, drop the backup_ schema, delete from vacant_properties where opa_id is null, and run again. You should also run delete from property_tax_delinquencies where opa_number = 0 for the same reason.

got it, can you include in documentation?

brandonfcohen1 avatar May 18 '24 20:05 brandonfcohen1

can you include more details on how to set up and test email/slack locally

Per the backend doc:

The CAGP_SLACK_API_TOKEN environmental variable must be set with the API key for the Slack app that can write messages to the channel as configured in the config.py report_to_slack_channel variable.

The report will also be emailed to any emails configured in the config.py report_to_email variable.

The unit tests test_send_report_to_slack and test_email_report are there. If you need help running python tests with pytest in VSCode there should be a lot of docs on the web.

I am still waiting on @nlebovits to get the API key to write to our clean-and-green-philly-back-end Slack channel. In the meantime you could create your own private Slack channel and create an app API key to test writing to it per these instructions: https://www.datacamp.com/tutorial/how-to-send-slack-messages-with-python That is what I did. You will need a working smtp server for email as configured in config.py. I put localhost and installed postfix on my machine to send mail.

zigouras avatar May 18 '24 20:05 zigouras

can you include instructions on how to run

got this running the new script for the first time:

This happens the first time you run it because there are null opa_ids in the vacant_properties table. To rerun, drop the backup_ schema, delete from vacant_properties where opa_id is null, and run again. You should also run delete from property_tax_delinquencies where opa_number = 0 for the same reason.

got it, can you include in documentation?

I added a section in the backend doc for this.

zigouras avatar May 18 '24 20:05 zigouras

@brandonfcohen1 I changed the way the data-diff is done so it does not create the primary key constraints in the pg db or need any preliminary clean up of data with sql. This keeps your existing script logic and feature layer classes unchanged and is a more extensible, loosely coupled solution. I leveraged the -w flag in the data-diff program to apply a where clause to limit the records being compared.

Please look at my latest commit to this PR. You should be able to run the script.py now without any preparatory execution of sql to clean up the data. Thanks.

zigouras avatar May 21 '24 10:05 zigouras

@nlebovits is this something you can look at if Brandon can't?

Moylena avatar May 28 '24 20:05 Moylena