computer-vision-challenge icon indicating copy to clipboard operation
computer-vision-challenge copied to clipboard

This is a series of computer vision foundational projects that anyone diving into the field must tackle.

GitHub GitHub repo size GitHub commit activity (branch) Packagist Stars Packagist forks

Computer Vision Challenge 🏆


This is a collection of foundational projects for anyone diving into computer vision.

Explore some of computer vision core concepts and hands-on projects through this fun challenge.

The project has 3 levels:

  • Level 0 - Zero (beginner): Getting Started with Basics
  • Level 1 - Apprentice (intermediate): Hands-on Computer Vision with Deep Learning
  • Level 2 - Hero (advanced): Large Vision Models (LVMs) from Image Generation, Inpainting, & More

In L1 and L2, we primarily leverage pre-trained models to ensure accessibility for everyone. This also allows us to explore a wider range of vision recognition tasks using different types of models while focusing on the model's performance and outcome.

Basic Computer Vision Pipeline

graph LR
    A[Image Acquisition] ==> B[Image Processing]
    B ==> C[Feature Extraction]
    C ==> D[Output, Interpretation & Analysis]

    style A fill:#EEE,stroke:#333,stroke-width:4px
    style B fill:#F88,stroke:#333,stroke-width:4px
    style C fill:#4F4,stroke:#333,stroke-width:4px
    style D fill:#33F,stroke:#333,stroke-width:4px


To install the dependency packages using either conda or pip:

Using conda:

  1. create a new conda environment
conda create --name cv-challenge
  1. Activate the newly created environment:
source activate cv-challenge  # For bash/zsh
conda activate cv-challenge  # For conda prompt/powershell
  1. Install dependencies from the requirements.txt file:
conda install --channel conda-forge --file requirements.txt

Using pip:

  1. Install dependencies from the requirements.txt file:
pip install -r requirements.txt

Hands-on Computer Vision Challenges!

Level 0 - Zero: Getting Started with Basics 💪

Project Description Notebooks
[1] Getting Started with Images Load an image, display it, and apply basic transformations. Open notebook in Colab
[2] Basic Image Manipulation Modify pixels, resizing, Flipping, Cropping, image annotations Open notebook in Colab
[3] Image Filtering & Restoration Enhance or manipulate image features using filtering techniques. Open notebook in Colab
[4] Image Enhancement Enhance using arithmetic & bitwise operations Open notebook in Colab
[5] Image Segmentation (Traditional) segment images into regions or pixels that belong to different classes or categories Open notebook in Colab
[6] Feature Extraction & Alignment Learn how to extract features from images using descriptors based on the nature of the features Open notebook in Colab
[7] Optical Character Recognition (OCR) Learn how to recognize text in images or documents using libraries such as Tesseract, Pytesseract, or EasyOCR Open notebook in Colab

Level 1 - Apprentice: Hands-on Computer Vision with Deep Learning 🔥

Project Description Notebooks
[1] MNIST Handwritten Digit Recognition Train a simple neural network to classify handwritten digits from the MNIST dataset. Open notebook in Colab
[2] CIFAR-10 Image Classification Utilize convolutional neural networks (CNNs) to classify images of different types of objects from the CIFAR-10 dataset. Open notebook in Colab
[3] Object Detection with YOLOv5 Implement YOLOv5, a real-time object detection algorithm, to detect objects in images and videos. Open notebook in Colab
[4] Semantic Segmentation with DeepLabv3+ Utilize DeepLabv3+, a semantic segmentation model, to segment images into different semantic categories. Open notebook in Colab
[5] Facial Recognition with OpenFace Explore facial recognition using OpenFace, a facial recognition library, to identify individuals in images. Open notebook in Colab
[6] Object Tracking Follow the movement of objects in a video sequence. Open notebook in Colab
[7] Human Pose Estimation Estimate the pose of a person in an image or a video using OpenCV and a pre-trained model. Open notebook in Colab

Level 2 - Hero: Large Vision Models (LVMs) from Image Generation, Inpainting, & More ⚡

Project Description Notebooks
[1] Creative Image Generation with GANs Generate novel images of different styles using GANs. Open notebook in Colab
[2] Text-to-Image Synthesis with LLMs and Diffusion Models Create realistic and creative images from text descriptions using LLMs and diffusion models. Open notebook in Colab
[3] AI-Powered Image Restoration and Enhancement Restore and enhance images using AI methods. Open notebook in Colab
[4] Style Transfer with GANs and Image Processing Transfer the artistic style of one image to another. Open notebook in Colab
[5] AI-Driven Image Captioning and Storytelling Generate comprehensive and creative captions and stories from images using LLMs. Open notebook in Colab
[6] AI-Assisted Image Editing and Manipulation Automate image editing and manipulation tasks using AI. Open notebook in Colab
[7] AI Image Recognition Benchmarks with SOTA Vision Models Benchmark SOTA Vision Models on a variety of image recognition tasks, including image classification, object detection, ... Open notebook in Colab


Most projects are written in Jupyter notebooks, you can run the directly using jupyter notebook/lab or Colab.

For projects with a file, run the command below:


Roadmap & Upcoming Features


    flowchart BT
        A(Level 0: Zero) --> B(Level 1: Intermediate)
        A --> C(Level 2: Hero)
        A --> D(Level 3: Advanced)
        A --> E(Level 4: Expert)
        A --> F(Level 5: Master)
        style A fill:#fff,stroke:#333,stroke-width:2px
        style B fill:#88f,stroke:#333,stroke-width:2px
        style C fill:#8f8,stroke:#333,stroke-width:2px
        style D fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
        style E fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
        style F fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5

New levels:

  • L3 - Advanced: Video Models Benchmarking
  • L4 - Expert: Finetuning of VLMs (Vision Language Models) & LVMs
  • L5 - Master: Multimodality

Upcoming Features:

Feature Description Status
Code Refactoring Enhance code readability by cleaning, documenting, and integrating Gradio demos. To-Do
New Learning Levels Introduce advanced levels: L3 - Video Models Benchmarking, L4 - Finetuning of VLMs (Vision Language Models) & LVMs, and L5 - Multimodality To-Do
Wiki Update Document the new learning levels in the project Wiki. To-Do
Multilingual Support Translate the file into multiple languages (French, Spanish, etc.). To-Do
Edge Device Deployment Explore code translation for deployment on edge devices using C++ or Rust. To-Do
Performance Enhancements Investigate options to improve performance, including adding new datasets and supporting additional computer vision tasks. To-Do
Machine Learning Framework Integration Integrate the project with popular machine learning frameworks. To-Do


We warmly welcome your contributions! Whether you're a seasoned developer or just starting out in Computer Vision, you can help us improve the project and make it more valuable to everyone.

How to contribute:

  • Fork this repository and clone it to your local machine.
  • Create a new branch with a descriptive name for your contribution.
  • Add your code and files to the branch and commit your changes.
  • Push your branch to your forked repository and create a pull request to the main repository.
  • Wait for your pull request to be reviewed and merged.

Sponsor this Project

Another way to get involved is by sponsoring the project.

Your support will help:

  • Provide computational resources (This is a GPU Poor Project!!!) to explore new frontiers in computer vision by training larger and more complex model
  • Keep the project up to date with the latest computer vision advancements
  • Create more detailed tutorials for users at all skill levels


This project is licensed under the MIT LICENSE.


The following resources have been influential in shaping this project:

"Vision is a picture of the future that produces passion." - Bill Hybels