continue icon indicating copy to clipboard operation
continue copied to clipboard

Youtube Transcript Context Provider

Open RomneyDa opened this issue 1 year ago • 5 comments

Validations

  • [X] I believe this is a way to improve. I'll try to join the Continue Discord for questions
  • [X] I'm not able to find an open issue that requests the same enhancement

Problem

Developers may want to add context from a specific Youtube video, like a tutorial or summary of a technology or approach.

Solution

This ContextProvider would accept a video id OR url as query (if url, extract id from query param "v"), and use the Youtube API to obtain a transcript. Other information like title, description, and publishedAt could also be obtained from the API. The URLContextProvider could be used as a starting point for this.

RomneyDa avatar Sep 19 '24 17:09 RomneyDa

Hi @RomneyDa , neat idea! This would be really cool to use while following along with a tutorial.

Does the API provide a transcript field? Didn't see anything skimming that link quickly.

If you're interested in contributing this we have some info on creating context providers here: https://github.com/continuedev/continue/blob/main/CONTRIBUTING.md#writing-context-providers

Patrick-Erichsen avatar Sep 19 '24 17:09 Patrick-Erichsen

Hi @Patrick-Erichsen,

Looks like there are a couple ways to get the transcript https://stackoverflow.com/questions/14061195/how-to-get-transcript-in-youtube-api-v3

Seems manually entered video captions are accessible from a public api, while automatic transcriptions (the more reliable way since would cover all vids) are only accessible via api with key. But might be simple workaround for automatic ones as well, see the stackoverflow discussion

I'll take a stab at writing the ContextProvider

RomneyDa avatar Sep 19 '24 17:09 RomneyDa

Actually this is a better doc link to check out: https://docs.continue.dev/customize/tutorials/build-your-own-context-provider

Implementing this as a custom context provider in config.ts is likely the right approach. Building out a community marketplace for context providers is something we're interested in, but for now, we're attempting to keep the scope of built-ins relatively limited.

Maybe the short term solution here is something similar to https://github.com/continuedev/prompt-file-examples/tree/main

@sestinj @TyDunn any thoughts?

Patrick-Erichsen avatar Sep 19 '24 17:09 Patrick-Erichsen

Ok, I'll look into custom approach

RomneyDa avatar Sep 19 '24 20:09 RomneyDa

A community marketplace for Context providers would be awesome. Here is a solution that works for future reference. I will include the full code here because it's brief.

// ~/.continue/config.ts

import { getSubtitles } from 'youtube-captions-scraper';

const YoutubeTranscriptContextProvider: CustomContextProvider = {
    title: "youtube-transcript",
    displayTitle: "Youtube",
    description: "Reference the transcript of a youtube video",
    type: "query",
    getContextItems: async (
        query: string,
        extras: ContextProviderExtras,
    ): Promise<ContextItem[]> => {
        const queryParams = new URLSearchParams(query)
        const videoID = queryParams.get('v') ?? query

        try {
            const subtitles = await getSubtitles({
                videoID,
                // lang: 'fr' // default: `en`
            }) as {
                start: string
                dur: string
                text: string
            }[]
            // Could remove timestamps
            const content = subtitles.map(sub => `${sub.start}: ${sub.text}`).join('\n')
            return [{
                name: 'Youtube video captions',
                description: 'The following timestamps and captions were taken from a youtube video',
                content
            }]
        } catch (e) {
            return [{
                name: 'Youtube transcription error',
                description: `There was an error getting the captions for video ${videoID}`,
                content: `Failed to get captions for youtube video ${videoID}`
            }]
        }
    }
};

export function modifyConfig(config: Config): Config {
    if (!config.contextProviders) {
        config.contextProviders = [];
    }
    config.contextProviders.push(YoutubeTranscriptContextProvider);
    return config;
}

Some notes:

  • must run npm i 'youtube-captions-scraper' from ~/.continue
  • could extend language support or use another scraper that gets more data from video, like title and description
  • I toyed with Google's proper caption APIs and it seems to not be a feasible solution because you can only get actual caption data for your own videos using OAuth. I tried a few methods in the above stack overflow chat with no success, seems the APIs have been discontinued, blocked, etc. But there's lots of scrapers out there, some of which like the youtube-dl CLI have won legal approval to stay on github (if I understood correctly)

RomneyDa avatar Sep 19 '24 21:09 RomneyDa