continue
continue copied to clipboard
Youtube Transcript Context Provider
Validations
- [X] I believe this is a way to improve. I'll try to join the Continue Discord for questions
- [X] I'm not able to find an open issue that requests the same enhancement
Problem
Developers may want to add context from a specific Youtube video, like a tutorial or summary of a technology or approach.
Solution
This ContextProvider would accept a video id OR url as query (if url, extract id from query param "v"), and use the Youtube API to obtain a transcript. Other information like title, description, and publishedAt could also be obtained from the API. The URLContextProvider could be used as a starting point for this.
Hi @RomneyDa , neat idea! This would be really cool to use while following along with a tutorial.
Does the API provide a transcript field? Didn't see anything skimming that link quickly.
If you're interested in contributing this we have some info on creating context providers here: https://github.com/continuedev/continue/blob/main/CONTRIBUTING.md#writing-context-providers
Hi @Patrick-Erichsen,
Looks like there are a couple ways to get the transcript https://stackoverflow.com/questions/14061195/how-to-get-transcript-in-youtube-api-v3
Seems manually entered video captions are accessible from a public api, while automatic transcriptions (the more reliable way since would cover all vids) are only accessible via api with key. But might be simple workaround for automatic ones as well, see the stackoverflow discussion
I'll take a stab at writing the ContextProvider
Actually this is a better doc link to check out: https://docs.continue.dev/customize/tutorials/build-your-own-context-provider
Implementing this as a custom context provider in config.ts is likely the right approach. Building out a community marketplace for context providers is something we're interested in, but for now, we're attempting to keep the scope of built-ins relatively limited.
Maybe the short term solution here is something similar to https://github.com/continuedev/prompt-file-examples/tree/main
@sestinj @TyDunn any thoughts?
Ok, I'll look into custom approach
A community marketplace for Context providers would be awesome. Here is a solution that works for future reference. I will include the full code here because it's brief.
// ~/.continue/config.ts
import { getSubtitles } from 'youtube-captions-scraper';
const YoutubeTranscriptContextProvider: CustomContextProvider = {
title: "youtube-transcript",
displayTitle: "Youtube",
description: "Reference the transcript of a youtube video",
type: "query",
getContextItems: async (
query: string,
extras: ContextProviderExtras,
): Promise<ContextItem[]> => {
const queryParams = new URLSearchParams(query)
const videoID = queryParams.get('v') ?? query
try {
const subtitles = await getSubtitles({
videoID,
// lang: 'fr' // default: `en`
}) as {
start: string
dur: string
text: string
}[]
// Could remove timestamps
const content = subtitles.map(sub => `${sub.start}: ${sub.text}`).join('\n')
return [{
name: 'Youtube video captions',
description: 'The following timestamps and captions were taken from a youtube video',
content
}]
} catch (e) {
return [{
name: 'Youtube transcription error',
description: `There was an error getting the captions for video ${videoID}`,
content: `Failed to get captions for youtube video ${videoID}`
}]
}
}
};
export function modifyConfig(config: Config): Config {
if (!config.contextProviders) {
config.contextProviders = [];
}
config.contextProviders.push(YoutubeTranscriptContextProvider);
return config;
}
Some notes:
- must run
npm i 'youtube-captions-scraper'from~/.continue - could extend language support or use another scraper that gets more data from video, like title and description
- I toyed with Google's proper caption APIs and it seems to not be a feasible solution because you can only get actual caption data for your own videos using OAuth. I tried a few methods in the above stack overflow chat with no success, seems the APIs have been discontinued, blocked, etc. But there's lots of scrapers out there, some of which like the youtube-dl CLI have won legal approval to stay on github (if I understood correctly)