Take advantage of increased LLM context window
RAG is dying as the context window keeps increasing exponentially. With 1M tokens context windows, who needs RAG. Caching course material and doing vector search seems like the way to go.
We will need to implement concepts from this paper: https://ai.google.dev/gemini-api/docs/long-context
https://ai.google.dev/gemini-api/docs/long-context
https://github.com/google-gemini/cookbook/blob/main/examples/Apollo_11.ipynb
This needs attention for long form content generation.
We have shifted to flash 2.5 and this feature is inheriently provided and needs no extra instrumentation on our part. I still need to test, but the quality suffers a lot with longer content. Needs more experiementation.