datahub
datahub copied to clipboard
Create a ds-infrastructure-announce list
Currently, we have no way to announce maintenance / downtime / new features to ours users. We should make an announce-only list that folks can subscribe to for such notifications.
- [x] Create the list
- [x] Make sure infra admins can post to it
- [ ] Require that new hub admin users are subscribed to it
- [ ] Advertise it prominently for subscription by anyone who wants it
@yuvipanda Has much thought been put toward the "official" naming for this service? For example, if we call the service "datahub", maybe we want the list to be named "datahub-announce"
Also I want to see if it makes sense to look into automating the addition of admins. It may not. @ryanlovett suggested that since the hub knows (or will know) who the admins are, it could in theory subscribe folks.
Lastly it has been noted that datahub has a dependency on bcourses for auth, thus bcourses outages also impact datahub. I'll speak with some folks here at RTL about that dependency to see if we should alter messaging a bit for change management.
@felder ah, let's just call it 'cdss-infrastructure-announce' for now. We can rename it if needed later.
Once we create it, we can just subscribe everyone who is admin now, so we have it working. And then as you say, we can probably write a script that does that to automate it.
@yuvipanda I feel pretty strongly that the name of the list should reflect the name of the service. When people see something to cdss-infrastucture-announce show up in their mailbox, they are not going to know what cdss-infrastructure is.
@felder alright. let's go with datahub-infrastructure-announce?
@yuvipanda I have not forgotten about this, just very busy.
I think we should call it datahub-announce, assuming datahub is going to be the official name of the service.
@felder yeah, lets go with that for now!
@yuvipanda list created
I've added some of the usual suspects as list managers.
Target audience is anyone who cares about new features for the hub, or outages. Examples: Course staff, students, anyone else who cares.
Ryan: As part of people getting accounts, they get added to a list. No application process here, so who? Ryan & Kevin Heard populate calgroups, sync it to google groups - api access for calgroups, google groups doesn't.
Jonathan: Faculty teaching on the hub should subscribe, and should suggest students / staff to subscribe.
Aaron: Opt-in vs opt-out. Suggests integrating with systemstatus.
Felder says we should look at changeboard / systemstatus. Investigate 'changeboard'? Work on 'no fly zones', like finals week, etc. Investigate integrating with other processes on campus. Should be worked out with service lead. Need to impose order & beuorocracy.
@balajialg you might find this interesting
@yuvipanda Super helpful! I didn't realize that we have an announcement list set up already. Great news! It will be beneficial from a support process outreach standpoint.
I will start speaking to a few service leads at RTL to figure out their existing communication processes and understand how we could integrate ours to a system-wide outreach. On another note, What does "change board" mean?
@balajialg every Tuesday representatives from each IT dept. participate in a weekly change board meeting where proposed scheduled maintenance events and their impacts are discussed. At that time, everyone present has an opportunity to ask questions if desired about how a change might impact their own service as well as raise any objections if a change would cause unacceptable downstream effects.
If you want to know more, we can discuss.
That's helpful, @felder! I have few questions related to the change board. Please feel free to answer either here or over Slack when you have time!
- Which RTL services are part of the change board? Any documentation in Confluence that I could potentially look at to understand how they engaged with this process?
- Do we have any existing dependencies with central IT? If yes, How did we engage with them till now?
- How often do we have scheduled maintenance across the services? Is the timeline for maintenance centralized, or each service chooses to define its outage time?
@balajialg
- It's somewhat discretionary based on what the service lead thinks, but bcourses is a big one. Confluence documentation can be found here: https://confluence.ets.berkeley.edu/confluence/display/CE/RTL+Change+Management+and+Service+Status
- Yes. For example we rely on bcourses for authentication and bcourses relies on calnet. Similarly we require access to SIS API data in order to determine who the instructors/students in a course are. We use this for provisioning additional resources on a per course basis.
- Yes, there's scheduled maintenance in at least one service on campus weekly. Often more. Regarding maintenance timelines, no they aren't centralized but they are discussed and if there are major objections they are often rescheduled subject to those objections.
Perhaps the thing to do is for you to attend a change board meeting so that you have an idea of what is discussed there. Generally it's probably not a great use of your time to do so regularly, but it may make sense for you to attend at least one.
@felder
Super helpful! Appreciate it. I will plan to attend a change board meeting in the next 2-3 weeks to get a sense of the process.
@yuvipanda @felder @ryanlovett One of the challenges I have observed whenever there is an outage is the number of messages we get via Slack/Emails. This can be stressful and time-consuming for both us and our users. In order to streamline our communications during an outage, I was thinking of the below idea. Please let me know your thoughts!
From a contingency planning perspective, We can do the following whenever an outage happens,
- Use the [email protected] email list to email all our users' information about the outage answering the basic questions below,
- What is the issue causing this outage?
- What is the expected time by which the hub will be back?
- Affected hubs/services!
- Any other additional Information
- Send an update email once the issue gets resolved with the steps we will take in the future to avoid similar issues.
Have we added all the teaching teams as part of this email list? How feasible is this idea?
@balajialg for outages, I'd rather have us use something like statuspage, so we have a page like this rather than email everyone.
Berkeley IT already has https://systemstatus.berkeley.edu/, so we could also pick up a line item there. Not sure if it can be detailed enough though.
I like the idea of having a status page. Didn't think of this idea before. I will figure out the next steps wrt to having a status page for our service on the Berkeley IT status page.
Also, from a user perspective (mainly instructors), We need to figure out a way to signal trust that we are working hard to get the systems back to normal in the way it is scalable. I am not sure the template provided by Berkeley IT alone would solve that. Maybe having a custom template with more detailed information tailored towards them could be helpful.
For eg (based on my experience in the last few months), some of the questions instructors seem to ask when an outage happens,
- If there is an assignment deadline, Is the issue severe enough that I should provide an extension for my students? (Data 100 provided an extension because of one of the outages with Data 100 hub)
- Should I postpone my plan to use Datahub during the class today?
IMHO, Some kind of thoughtful communication might be helpful.
@balajialg I agree that having a space where we can provide more communication than just a line or two is very helpful. We could try get our own statuspage, as there is space for narrative information there.