Improve how WG activities get documented
@kimdhamilton Is it possible to activate an AI meeting notetaker at Zoom account level so that it automatically takes notes? Would people be comfortable meetings being recorded and/or having an AI notetaker?
I'm interested in this; let's discuss at the next TSC meeting.
- It looks like we used to use otter.ai, not sure why it was deactivated
- We use agenda.md files to keep a rolling agenda and very high-level notes. We should ensure we're still onboard with using this; i.e. AI notes for archives, and agenda.md for high-level outcomes and action items
Per discussion, we have the ability to use zoom AI transcripts for most of our meetings, but some of the older ones need to be converted.
I expect we need ~1 day during downtime to convert the old meetings. Note that there are other benefits to this conversion (I've been wanting to do it for other reasons, like zoom archive cleanup), but we should pick a time later in the year after hackathon.
Note that all working group meetings must be (and are already) recorded for IP audit trails. But adding AI transcripts would be new. Question for @ankurdotb: is this issue referring to WG meetings or other?
Assuming WG meetings or WG + OG meetings, TSC to discuss and decide the following:
- Approval to activate some AI assistant in DIF group meetings
- If so, which one. Per previous discussion, Zoom AI seemed the safest in terms of not using our data for training (also it's included in our plan)
- Which format/formats do we want? For context:
- Groups currently have their recordings stored in DIF Meeting Recordings. Do we want to revisit this and do something more navigable, or simply make the transcripts more accessible from here?
- How is agenda.md affected? Do we want to explore opportunities for auto-generation of notes? We have an outstanding issue on agenda.md structure.
I'm open to all ideas about making this easier for chairs. Over the past few years, there have been attempts to standardize the agenda.md file, but that's not universally adopted. So we have the opportunity for an overhaul.
I think it's a good idea to have automatic transcripts, or automatic summaries, or even both. Links to those could be added directly to the agenda.md files, just like we add links to recordings now.
If it's possible to even do this for past meetings, that would also be really nice.
Step 1 I think there was consensus that:
- we want automatic transcripts, summaries, or both
- we want to use built in Zoom AI because good data handling policies
Agreed?
Step 2
If so, as for implementation: I investigated and realized we would have to do this when there's some downtime (post hackathon) because some of the zoom meetings would need to be reconfigured. (for some of our zoom accounts, it's an easy switch, but not all). Also, we'd need to update Zapier tooling to make these accessible.
Backfilling is a nice-to-have, and I will investigate.
I meant SC, TSC meetings and all WG meetings...but given the sensitivity, I would perhaps give the option to each WG on whether they want to accept Zoom's AI meeting notes or not. I appreciate that it might be more work to collate, unless theoretically we can get the WG signoff from WG chairs at TSC
- Kim to share example transcripts from previous calls
- Transcription tool: https://github.com/zackees/transcribe-anything
- The AI summaries need to be vetted/reviewed afterwards to make sure they capture the specific actual messages/nuances expressed in the meetings.
- PLEASE make sure you continue to record and curate the recordings. Some of the other DID WGs don't do this and a lot is lost - particularly if the minutes don't have a lot of fidelity. Cheers
Update:
- Enabled by default for group meetings going forward
- Backfilled settings for existing meetings
- At least all i could find (note that these become hard to find in zoom after the "end date", even if WGs are continuing to use the meeting link. We can keep an eye on this.
- Enabled transcript creation for zoom 2 (for some reason this was disabled)
Still need to determine how to integrate this into the Recordings spreadsheet
This is ready for review and discussion of next steps. This was much trickier to integrate than I thought, so some details as well (for posterity).
Summary
Here's a summary of the changes and resulting functionality
- Enabled AI summaries by default at the account level: this means all new meetings will get this setting
- Backfilled this setting for old meetings
- Added a new "Summaries" sheet within the DIF Meeting Recordings spreadsheet (here)
- Created a Google AppScript that queries for new meeting summaries from Zoom meeting summaries API. It does the following:
- Gets new meeting summaries
- For each new meeting summary, populates "Summaries" sheet
- Creates human-readable google doc for each. This is filed by meeting series, under DIF Meetings > Meeting Summaries, and linked from the Summaries sheet
To Discuss
- [ ] Cleaning up transcription errors
- There are clearly errors (as expected) in the AI transcription.
- We could give permission for TSC chairs to edit the docs, but is this now an obligation?
- Proposed: insert text that this is AI-generated and hasn't been reviewed. Chair can remove it when they're editing
- Side note: if we are editing the doc, this would argue for dropping the spreadsheet columns with this data, or automate back-propagating (but that might be overly tedious to keep the same structure, so I believe the former is better)
- [ ] "Integrating" the Summaries data with the rest of the spreadsheet
- Proposed: change new meeting entries in the Recordings spreadsheet to link to a google folder containing all meeting materials (recording, summary, transcript, ...)
- See details below about why & tradeoffs
- [ ] FYI, I intend to postpone the following due to the time required for this so far.
- Backfilling meeting summaries (i.e. piping old recordings to a ai summary tool)
- Moving zoom recordings to google drive (zoom cloud storage costs us $100/mo, but effort to do that makes it lower priority)
Details
Fetching Meeting Summaries
Zapier's zoom integration didn't work correctly for meeting summaries. Note: I'm not exactly sure why; they have a meeting summary event, which should function similarly to our "new meeting recording" event, but Zapier logs (plus google searches) didn't give me the visibility I needed.
Therefore, I used the Zoom API and created a Google App Script. It fetches all new available (non-empty) meeting summaries as of the last query date (per the last entry in a spreadsheet)
- Query uses a
fromparam as the last "meeting start time" in the Summaries sheet (plus 1 minute so it doesn't refetch) - Post-fetch filters out meetings with summary start time of Unix epoch beginning (the signal that it will be empty)
- Adds newly available summaries to the Summaries sheet.
- Each summary object has metadata, overview, details, and next steps. Those are stored as columns in a row
- The summary object is also converted to a human-readable google doc, which is created and linked from column C
- This script is visible to those with administrative access
- I didn't want to expose the google appscript via an open REST endpoint, so it's not called by Zapier. Instead, it triggers on a daily timer
Integrating Summaries into Recordings spreadsheet
The link to zoom-stored recordings shows pieces of the summary, but it's tedious to navigate, and I assume people would like the summary in complete text form. So the question is how to reconcile meeting recording links and summary links.
Originally, I'd assumed we would link the meeting summaries next to the recording link in the Recordings spreadsheet. But the current structure of the Recordings spreadsheet makes it extremely tedious to add a new column.
A further complication is that the meeting recordings objects in zapier do not integrate summaries. We could manually match (post-fetching) on meeting + datetime substring, but we'd then need to propagate that to the Recording sheet and through all Working group + work item sheets.
IMO the best solution is what I proposed above, i.e. create meeting instance folders in google drive, put the recording + summary (and whatever else) in there, and link to that folder from the spreadsheet. Here's how it would work:
- Since we have to deviate from Zapier anyway for summaries, we could switch off current zaps that link the recordings spreadsheet to the zoom cloud-stored recording
- Update my google appscript to download recordings in addition to summary
- In the recording spreadsheet, link to a folder containing the recording, summary, and other data (detailed transcript)
- Then the working group + work item sheets automatically get the updated values
This means we get all of the good behavior going forward (not using zoom cloud storage for new meetings, spreadsheet links to all meeting materials).
The downsides are:
- there's a cutoff point of Dec 16, 2024 so we'd have before-and-after behavior. I.e., afterward, these entries (in pink, Figure 1) would link to a DIF google folder containing recording, summary, transcript, etc, instead of linking to the zoom cloud recording.
- The zoom cloud recording link built-in player is more user-friendly than the google drive media player.
Other alternatives considered:
- just link to the meeting series share near the header. This avoids needing to adjust columns, but does not have a direct meeting instance to summary link. See Figure 2. This is easy, but lame
- Overhaul this spreadsheet approach. Too much work, but can consider later
Figure 1:
Figure 2:
@kimdhamilton:
- give metrics on views and use
- put disclaimer in by default
Thank you so much for implementing this trial @kimdhamilton. My notes below on some of the topics above.
- Who is the DIF meeting recordings spreadsheet a) publicised to and b) accessible by? I wasn't aware of it, until this discussion so I wonder how widely known it is, because it's a fantastic resource. If it's indeed accessible by all DIF members, I would promote it more widely, e.g., include a link to it in calendar invites, set the link in Slack channel descriptions etc. (Longer-term idea: use tools/software that can use Google Sheets as a DB to turn this into an easier-to-browse site/page.)
- IMO it's fine to have a cutoff point where there are "old links" (either Drive or Zoom) and after a certain date when this feature was turned on, having two columns: one for the link to recording, and the other to the document. Since there will be no transcript before the trial implementation date, the link to transcript doc can just be N/A or "Meeting not transcribed" - something like that.
Proposed: insert text that this is AI-generated and hasn't been reviewed. Chair can remove it when they're editing
- Insert by default, as this is extremely common on AI-generated transcripts. On that note, I wonder whether along with this note, we should also link to Zoom's FAQ that clarifies that Zoom does not use any of this data to train its own or third-party AI models. Given our audience in DIF, people might care. (This context was discussed in TSC meeting, but I don't know how widely it's known and there's a chance that a meeting attendee/contributor might object to this feature if they thought it would be used to train AI models.)
- Encourage WG chairs to post transcript/summaries on Slack or by email with link to doc and/or video. I hope that this encourages better engagement, but conversely, might also lead to fewer people attending "because they can always catch up async". However, since a lot of the discussions are also via Slack/Github, probably a net-good.
Backfilling meeting summaries (i.e. piping old recordings to a ai summary tool)
- Not worth the time, effort, and cost.
Moving Zoom recordings to Google Drive (zoom cloud storage costs us $100/mo, but effort to do that makes it lower priority)
- We should always aim for efficiency so this does make sense. One question I had though is whether Zoom's service provides any metrics on which videos get watch, how often, etc. At the very least, even if we move to Google Drive, this is a good benchmark to have and understand on how useful any of this is (or an indication that it needs to be promoted more).
I know how much of a pain Zapier can be, so thank you for doing all of this! Personally, I found the meeting summaries super useful to review after even meetings I did attend like SC/TSC.
Thanks for the feedback Ankur! Some responses:
- DIF meeting records spreadsheet:
- Its visibility is read-only to the public
- Where it's linked from currently
- Some work items / working groups link to it from their group page, and the AGENDA.md template has a spot for it (overridden to point to the group's specific sheet). See DID Methods for example, screenshot below
- We also pin it in the
#recordingsslack channel - Interested to hear where else we should be promoting it
- +1 on a longer-term improved representation. Will create an issue for the backlog
- I'll figure out where to squeeze it in
- Good addition
- We'll let ppl know in TSC meeting and remind
- Ack
- Good point on metrics; creating backlog issue
Near term action items:
- [x] Kim insert disclaimer into AI summaries, including Zoom's usage info
- [x] Kim figure out how to get fit summary links into recordings sheet
Long-term action items (created issues in Backlog):
- Use tools/software that can use Google Sheets as a DB to turn DIF recordings spreadsheet into an easier-to-browse site/page
- Long-term solution for recordings
- Determine metrics for Zoom recordings; do we have a sense of "viewing" metrics over time? Can we also get these if we self-host?
- Consider migrating recordings to google drive, self-hosting, and updating recordings spreadsheet to point to new links
No action / deprioritized:
- Backfilling summaries
I'm curious btw, how does W3C make their bot that inserts references in issues work? Maybe we can find out from them?
There is a fair amount of automation. W3C uses this: https://www.w3.org/2001/12/zakim-irc-bot.html, and that's the Zakim you see mentioned in W3C DID WG for example. So this is probably achieved by commands entered into irc (or relevant client) during the meeting.
Community groups couldn't use it (don't remember why) so in CCG simply we used a post-processing tool https://github.com/w3c-ccg/meetings/tree/main/scribe-tool, but our main goal was formatting and cleanup. I don't think we did anything like that.
It's probably possible through post-processing or some automation, but we'd probably need to first standardize on a command. This could be a follow-up issue
Excellent thank you! And a second thing I was confused by...I found a DIF Labs WG meeting from 17th January, but not the one from yesterday (Show & Tell across the team). Is it because that was a different, non-recorded meeting / series or because it's not yet processed on Zapier automation to move it from Zoom to Google Docs?
Ankur: Interesting; let me check on that. It should be pulling fresh ones every hour
1/22 discussion: audit to make sure recording links on website go to the correct tabs in the recording spreadsheet; not just the general "index"
Closing as complete