computer-vision-challenge
computer-vision-challenge copied to clipboard
This is a series of computer vision foundational projects that anyone diving into the field must tackle.
Computer Vision Challenge 🏆
Overview
This is a collection of foundational projects for anyone diving into computer vision.
Explore some of computer vision core concepts and hands-on projects through this fun challenge.
The project has 3 levels:
- Level 0 - Zero (beginner): Getting Started with Basics
- Level 1 - Apprentice (intermediate): Hands-on Computer Vision with Deep Learning
- Level 2 - Hero (advanced): Large Vision Models (LVMs) from Image Generation, Inpainting, & More
[!IMPORTANT]
In L1 and L2, we primarily leverage pre-trained models to ensure accessibility for everyone. This also allows us to explore a wider range of vision recognition tasks using different types of models while focusing on the model's performance and outcome.
Basic Computer Vision Pipeline
graph LR
A[Image Acquisition] ==> B[Image Processing]
B ==> C[Feature Extraction]
C ==> D[Output, Interpretation & Analysis]
style A fill:#EEE,stroke:#333,stroke-width:4px
style B fill:#F88,stroke:#333,stroke-width:4px
style C fill:#4F4,stroke:#333,stroke-width:4px
style D fill:#33F,stroke:#333,stroke-width:4px
Requirements
To install the dependency packages using either conda
or pip
:
Using conda:
- create a new conda environment
conda create --name cv-challenge
- Activate the newly created environment:
source activate cv-challenge # For bash/zsh
conda activate cv-challenge # For conda prompt/powershell
- Install dependencies from the requirements.txt file:
conda install --channel conda-forge --file requirements.txt
Using pip:
- Install dependencies from the requirements.txt file:
pip install -r requirements.txt
Hands-on Computer Vision Challenges!
Level 0 - Zero: Getting Started with Basics 💪
Level 1 - Apprentice: Hands-on Computer Vision with Deep Learning 🔥
Level 2 - Hero: Large Vision Models (LVMs) from Image Generation, Inpainting, & More ⚡
Usage
Most projects are written in Jupyter notebooks, you can run the directly using jupyter notebook/lab
or Colab
.
For projects with a main.py
file, run the command below:
python main.py
Roadmap & Upcoming Features
Roadmap:
flowchart BT
A(Level 0: Zero) --> B(Level 1: Intermediate)
A --> C(Level 2: Hero)
A --> D(Level 3: Advanced)
A --> E(Level 4: Expert)
A --> F(Level 5: Master)
style A fill:#fff,stroke:#333,stroke-width:2px
style B fill:#88f,stroke:#333,stroke-width:2px
style C fill:#8f8,stroke:#333,stroke-width:2px
style D fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
style E fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
style F fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
New levels:
- L3 - Advanced: Video Models Benchmarking
- L4 - Expert: Finetuning of VLMs (Vision Language Models) & LVMs
- L5 - Master: Multimodality
Upcoming Features:
Feature | Description | Status |
---|---|---|
Code Refactoring | Enhance code readability by cleaning, documenting, and integrating Gradio demos. | To-Do |
New Learning Levels | Introduce advanced levels: L3 - Video Models Benchmarking, L4 - Finetuning of VLMs (Vision Language Models) & LVMs, and L5 - Multimodality | To-Do |
Wiki Update | Document the new learning levels in the project Wiki. | To-Do |
Multilingual Support | Translate the README.md file into multiple languages (French, Spanish, etc.). | To-Do |
Edge Device Deployment | Explore code translation for deployment on edge devices using C++ or Rust. | To-Do |
Performance Enhancements | Investigate options to improve performance, including adding new datasets and supporting additional computer vision tasks. | To-Do |
Machine Learning Framework Integration | Integrate the project with popular machine learning frameworks. | To-Do |
Contributing
We warmly welcome your contributions! Whether you're a seasoned developer or just starting out in Computer Vision, you can help us improve the project and make it more valuable to everyone.
How to contribute:
- Fork this repository and clone it to your local machine.
- Create a new branch with a descriptive name for your contribution.
- Add your code and files to the branch and commit your changes.
- Push your branch to your forked repository and create a pull request to the main repository.
- Wait for your pull request to be reviewed and merged.
Sponsor this Project
Another way to get involved is by sponsoring the project.
Your support will help:
- Provide computational resources (This is a GPU Poor Project!!!) to explore new frontiers in computer vision by training larger and more complex model
- Keep the project up to date with the latest computer vision advancements
- Create more detailed tutorials for users at all skill levels
LICENSE
This project is licensed under the MIT LICENSE.
References
The following resources have been influential in shaping this project:
- Computer Vision OpenCV Python Free Course Udemy
- Computer Vision Free Course - Kaggle
- Visual Perception for Self-Driving Cars - University of Toronto
- The Complete Self-Driving Car Course - Udemy
- Top Computer Vision Projects (2023) - GeeksforGeeks
- 15 Computer Visions Projects You Can Do Right Now - neptune.ai
- 30+ Unique Computer Vision Projects with Source Code – 2023
- 7+ Computer Vision Projects on GitHub with Source Code 2024 - Omdena
- 20+ Computer Vision Projects Ideas for Beginners in 2023.
- A Dive into Vision-Language Models
- Advances in Visual Pretraining for LLMS | Neil Houlsby
- VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
- LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions
- Transforming Computer Vision with LLMs - Data Science Dojo
"Vision is a picture of the future that produces passion." - Bill Hybels