Flowise
Flowise copied to clipboard
feat: Add Image Loader with OCR support for multiple providers (#5195)
- Add Image Loader node with support for Unstructured, AWS Textract, and Google Vision OCR providers
- Implement OCR provider abstraction interface for extensibility
- Add Google Vision API credential configuration
- Update AWS credential to include region field
- Add image file type mappings (.jpg, .jpeg, .png, .gif, .bmp, .webp, .tiff, .tif)
- Update dropdown components to render HTML links in descriptions
- Support theme-aware link styling in dropdown options
This implementation fixes issue #5195
Summary of Changes
Hello @aibysid, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the system's document processing capabilities by introducing a new Image Loader node. This node leverages Optical Character Recognition (OCR) to extract text from image files, supporting multiple leading providers like Unstructured, AWS Textract, and Google Vision. The changes also include necessary credential configurations for these services, expanded image file type recognition, and UI improvements to dropdown descriptions, allowing for richer, interactive help text.
Highlights
- New Image Loader Node: Introduced a new 'Image Loader' node with Optical Character Recognition (OCR) capabilities to extract text from image files.
- Multiple OCR Provider Support: Integrated support for three distinct OCR providers: Unstructured, AWS Textract, and Google Vision, allowing users to choose their preferred service.
- Extensible OCR Abstraction: Implemented an extensible abstraction layer for OCR providers, making it easier to add new OCR services in the future.
- Google Vision API Credential: Added a new credential type specifically for configuring Google Cloud Vision API keys.
- AWS Credential Update: Updated the existing AWS credential configuration to include an optional 'region' field, which is required for AWS Textract.
- Expanded Image File Type Mappings: Expanded the system's file type recognition to include various image formats such as JPG, JPEG, PNG, GIF, BMP, WEBP, TIFF, and TIF.
- HTML Link Rendering in Dropdowns: Enhanced dropdown components to correctly render HTML links within their descriptions, improving user guidance and accessibility.
- Theme-Aware Link Styling: Applied theme-aware styling to the newly rendered HTML links in dropdowns, ensuring visual consistency across different themes.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.
do you have a screenshot to show how it works?