terraform-genai-doc-summarization
terraform-genai-doc-summarization copied to clipboard
Summarizes document using OCR and Vertex Generative AI LLM
Generative AI Document Summarization
Description
Tagline
Create summaries of a large corpus of documents using Generative AI.
Detailed
This solution showcases how to summarize a large corpus of documents using Generative AI. It provides an end-to-end demonstration of document summarization going all the way from raw documents, detecting text in the documents and summarizing the documents on-demand using Vertex AI LLM APIs, Cloud Vision Optical Character Recognition (OCR) and BigQuery.
PreDeploy
To deploy this blueprint you must have an active billing account and billing permissions.
Architecture
- The developer follows a tutorial on a Jupyter Notebook, where they upload a PDF — either through Vertex AI Workbench or Colaboratory.
- The uploaded PDF file is sent to a function running on Cloud Functions. This function handles PDF file processing.
- The Cloud Functions function uses Cloud Vision to extract all text from the PDF file.
- The Cloud Functions function stores the extracted text inside a Cloud Storage bucket.
- The Cloud Functions function uses Vertex AI’s LLM API to summarize the extracted text.
- The Cloud Functions function stores the text summaries of PDFs in BigQuery tables.
- As an alternative to uploading PDF files through Jupyter Notebook, the developer can upload a PDF file directly to a Cloud Storage bucket — for instance, through the Console UI or gcloud. This upload triggers Eventarc to begin the Document Processing phase.
- As a result of the direct upload to Cloud Storage, Eventarc triggers the Document Processing phase, handled by Cloud Functions.
Documentation
Deployment Duration
Configuration: 1 mins Deployment: 10 mins
Cost
Inputs
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| bucket_name | The name of the bucket to create | string |
"genai-webhook" |
no |
| gcf_timeout_seconds | GCF execution timeout | number |
900 |
no |
| project_id | The Google Cloud project ID to deploy to | string |
n/a | yes |
| region | Google Cloud region | string |
"us-central1" |
no |
| time_to_enable_apis | Wait time to enable APIs in new projects | string |
"180s" |
no |
| webhook_name | Name of the webhook | string |
"webhook" |
no |
| webhook_path | Path to the webhook directory | string |
"webhook" |
no |
Outputs
| Name | Description |
|---|---|
| genai_doc_summary_colab_url | The URL to launch the notebook tutorial for the Generateive AI Document Summarization Solution |
| neos_walkthrough_url | The URL to launch the in-console tutorial for the Generative AI Document Summarization solution |
Requirements
These sections describe requirements for using this module.
Software
The following dependencies must be available:
- Terraform v0.13
- Terraform Provider for GCP plugin v3.0
Service Account
A service account with the following roles must be used to provision the resources of this module:
- Storage Admin:
roles/storage.admin
APIs
A project with the following APIs enabled must be used to host the resources of this module:
- Google Cloud Storage JSON API:
storage-api.googleapis.com
Contributing
Refer to the contribution guidelines for information on contributing to this module.
Security Disclosures
Please see our security disclosure process.