Note:

This repo aims to provide a Ready-to-Go setup with TensorFlow environment for Image Captioning Inference using pre-trained model. For training from scratch or funetuning, please refer to Tensorflow Model Repo.

Model Overview
- Introduction
- Architecture
Requirement
- Install
- Get Pre-trained Model
Generating Captions
Issue

Model Overview

Introduction

The Show and Tell model is a deep neural network that learns how to describe the content of images. For example:

Example captions

Show and Tell: A Neural Image Caption Generator

A TensorFlow implementation of the image-to-text model described in the paper:

"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge."

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.

IEEE transactions on pattern analysis and machine intelligence (2016).

Full text available at: http://arxiv.org/abs/1609.06647

Architecture

Please refer to the original Tensorflow Model Repo.

Requirement

Install Required Packages

I strongly suggest that you run pip install -r requirement.txt in your CLI to get all packages needed.

OR you could opt for manually installing the required packages below:

TensorFlow 1.0 or greater (instructions)
NumPy (instructions)
Natural Language Toolkit (NLTK):
- First install NLTK (instructions)
- Then install the NLTK data package "punkt" (instructions)

Get Pre-trained Model

Download inceptionv3 finetuned parameters over 1M and you will get 4 files, and make sure to put them all into this path im2txt/model/Hugh/train/

newmodel.ckpt-2000000.data-00000-of-00001
newmodel.ckpt-2000000.index
newmodel.ckpt-2000000.meta
checkpoint

Generating Captions

Your downloaded Show and Tell model can generate captions for any JPEG image! The following command line will generate captions for such an image.

python im2txt/run_inference.py --checkpoint_path="im2txt/model/Hugh/train/newmodel.ckpt-2000000" --vocab_file="im2txt/data
/Hugh/word_counts.txt" --input_files="im2txt/data/images/test.jpg"

Example output:

Captions for image test.jpg:
  0) a young boy wearing a hat and tie . (p=0.000195)
  1) a young boy wearing a blue shirt and tie . (p=0.000100)
  2) a young boy wearing a blue shirt and a tie . (p=0.000045)

Note: you may get different results. Some variation between different models is expected.

Here is the image:

Encoutering Issue

First, check out on this thread and it's likely that you find answer there. Otherwise, open an issue and I will try to help you.

Im2txt
Im2txt copied to clipboard

Metadata

Note:

Contents

Model Overview

Introduction

Architecture

Requirement

Install Required Packages

Get Pre-trained Model

Generating Captions

Encoutering Issue

← Metadata

Owner

Metadata

Im2txt Im2txt copied to clipboard

Metadata

Note:

Contents

Model Overview

Introduction

Architecture

Requirement

Install Required Packages

Get Pre-trained Model

Generating Captions

Encoutering Issue

← Metadata

Owner

Metadata

Im2txt
Im2txt copied to clipboard