Issue 520
Implementation of issue #520. Backup of database, archiving and diff reporting.
Someone is attempting to deploy a commit to the Clean and Green Philly Team on Vercel.
A member of the Team first needs to authorize it.
Moving this conversation about primary keys here.
@nlebovits- Is it correct to say that OPA_ID is unique per parcel, but for multifamily buildings there can be one or many vacant units inside one parcel?
@zigouras- can you document how the tests work and how to run them?
@brandonfcohen1 the tests are commented, indicating what each one does. Some of them are POCs, just running code to see the output. Some have assertions. I would not run them as part of CI/CD but they are a good place for people to look to understand the underlying code.
What does the vacant_properties_end table hold, is it essentially a union of the data in all of the other tables? If so seems like it could be the most important table to include in the diff so it would be good to work out the pks on that one.
What does the
vacant_properties_endtable hold, is it essentially a union of the data in all of the other tables? If so seems like it could be the most important table to include in the diff so it would be good to work out the pks on that one.
vacant_properties_end is the end result of the script, it's really the only table we need to report a diff on
@brandonfcohen1 you are probably looking at an earlier commit in this PR. Look at the latest commit. The backend setup doc describes the new backup and diff process.
What does the
vacant_properties_endtable hold, is it essentially a union of the data in all of the other tables? If so seems like it could be the most important table to include in the diff so it would be good to work out the pks on that one.
vacant_properties_endis the end result of the script, it's really the only table we need to report a diff on
OK then we have to figure out how to enforce a primary key on it.
got this running the new script for the first time:
This happens the first time you run it because there are null opa_ids in the vacant_properties table. To rerun, drop the backup_ schema, delete from vacant_properties where opa_id is null, and run again.
You should also run delete from property_tax_delinquencies where opa_number = 0 and delete from li_violations where opa_account_num is null for the same reason.
can you include instructions on how to run
got this running the new script for the first time:
This happens the first time you run it because there are null opa_ids in the vacant_properties table. To rerun, drop the backup_ schema,
delete from vacant_properties where opa_id is null, and run again. You should also rundelete from property_tax_delinquencies where opa_number = 0for the same reason.
got it, can you include in documentation?
can you include more details on how to set up and test email/slack locally
Per the backend doc:
The CAGP_SLACK_API_TOKEN environmental variable must be set with the API key for the Slack app that can write messages to the channel as configured in the config.py report_to_slack_channel variable.
The report will also be emailed to any emails configured in the config.py report_to_email variable.
The unit tests test_send_report_to_slack and test_email_report are there. If you need help running python tests with pytest in VSCode there should be a lot of docs on the web.
I am still waiting on @nlebovits to get the API key to write to our clean-and-green-philly-back-end Slack channel. In the meantime you could create your own private Slack channel and create an app API key to test writing to it per these instructions: https://www.datacamp.com/tutorial/how-to-send-slack-messages-with-python That is what I did. You will need a working smtp server for email as configured in config.py. I put localhost and installed postfix on my machine to send mail.
can you include instructions on how to run
got this running the new script for the first time:
This happens the first time you run it because there are null opa_ids in the vacant_properties table. To rerun, drop the backup_ schema,
delete from vacant_properties where opa_id is null, and run again. You should also rundelete from property_tax_delinquencies where opa_number = 0for the same reason.got it, can you include in documentation?
I added a section in the backend doc for this.
@brandonfcohen1 I changed the way the data-diff is done so it does not create the primary key constraints in the pg db or need any preliminary clean up of data with sql. This keeps your existing script logic and feature layer classes unchanged and is a more extensible, loosely coupled solution. I leveraged the -w flag in the data-diff program to apply a where clause to limit the records being compared.
Please look at my latest commit to this PR. You should be able to run the script.py now without any preparatory execution of sql to clean up the data. Thanks.
@nlebovits is this something you can look at if Brandon can't?