copied to clipboard
O+M 2024-04-22
As part of day-to-day operation of, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.
Check the O&M Rotation Schedule for future planning.
Acceptance criteria
You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.
Daily Checklist
Note: Catalog Auto Tasks You will need to update the chart values manually. Click the Action link in each issue and grab the values from
monitor task output
andcheck runtime
- [ ] Check auto generated O&M tickets from no status column
- [ ] Check Harvesting Emails
- [ ] New Relic Alerts Triaged
- [ ] Triage DMARC Report from Google
Weekly Checklist
- [ ] DB-Solr Sync
- [ ] Audit Log (more info on AU-3 and AU-6 Log auditing)
- [ ] Tracking Update
- NOTE: This job will consistently timeout, but it is processing results ((more details)[])
- [ ] Check Catalog Solr
- [ ] Catalog Dupe Check
- [ ] Check user management requests
Monthly Checklist
- [ ] Invicti Scan
ad-hoc checklist
- [ ] audit/review applications on cloud foundry and determine what can be stopped and/or deleted.
- Watch for user email requests
- Watch in #datagov-alerts and Vulnerable dependency notifications (daily email reports) for critical alerts.
- Monitor and improve O&M Dashboard
- Update and revise O&M Tasks
DOE's /harvest/arm-data-json are down for two days
Tuesday 04/23
rerun the failed commit job, it succeeded.
Check Catalog Auto Tasks
Check Harvesting Emails
Update: Could not access the DOE site for this harvest source :, but the job finished without error today. Looks like their server goes up and down.
[x] Catalog:
[x] DB-Solr Sync:
0 packages need to be removed from Solr 0 packages need to be updated/added to Solr 429 packages without harvest_object need to be mannually deleted Finished 520s
Checked catalog, inventory production, works fine.
Also checked Solr leader and followers, all work as normal.
Tuesday 04/23
Check Catalog Auto Tasks
Check Harvesting Emails
Harvest Source: NASA Data.json
Organization: nasa-gov
Created: 2024-04-24 16:52:45.861624
Finished: 2024-04-24 16:54:02.582310
- Error loading json content: not enough values to unpack (expected 2, got 0).
- ProxyError getting json source: HTTPSConnectionPool(host='[](', port=443): Max retries exceeded with url: /data.json (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response'))).
The error occurred on April 23, 2024, but it was success on April 22, 2024. This issue has occurred intermittently in the past, indicating possible issue in resource availability
Harvest Source: DOI EDI
Organization: doi-gov
Created: 2024-04-24 16:52:45.119525
Finished: 2024-04-24 16:53:46.331441
- Error loading json content: not enough values to unpack (expected 2, got 0).
- HTTPError getting json source: 504 Server Error: Gateway Time-out for url:
The job run on April 24, 2024, at 5:23 PM was successful. This connection error had not occurred previously.
[x] Catalog:
[x] DB-Solr Sync:
0 packages need to be removed from Solr 0 packages need to be updated/added to Solr 429 packages without harvest_object need to be mannually deleted Finished 555s
Checked catalog, inventory production, works fine.
Also checked Solr leader and followers, all work as normal.
As one user pointed out, harvester /harvest/energy-json appears to be using a wrong URL. The current is frozen at 2023-04 time frame. A dynamic URL seems to be the correct URL, it redirects to the current year current month. @hkdctol