Intro

This repo is to implement a multi-modal natural language model with tensorflow.

Dependencies	DataSets
python 2.7 tensorflow lasagne Theano	IAPR TC-12

Project Overview

Firstly, a word embedding with word2vec net is trained against iaprtc12 datasets.
Secondly, the filtered (meaning, if the description is too long, we only keep the first sentence) word vectors for each description of image are used as target output of a CNN network

Setup

For various systems, you need to use different tools to install tensorflow, lasagne, theano, nolearn, ... dependencies, first.

Then, simply run below scripts to download the datasets

Run:

bash setup.sh

or:

make setup

Network Design

Word2Vec	StoryNet

Training

Run:

python train.py

or:

make train

Optimizer	Loss
MomentumOptimizer	MSE Loss

learning_curve

Pre-trained Model

Download here

Testing and Results

Run:

make demo

Train on your own

Run setup bash script to download datasets
Run train.py or with makefile
Freeze tensorflow model with the command provided in makefile
Run app.py or with makefile

Data Sets

The image collection of the IAPR TC-12 Benchmark consists of 20,000 still natural images taken from locations around the world and comprising an assorted cross-section of still natural images. This includes pictures of different sports and actions, photographs of people, animals, cities, landscapes and many other aspects of contemporary life.

Each image is associated with a text caption in up to three different languages (English, German and Spanish) . These annotations are stored in a database which is managed by a benchmark administration system that allows the specification of parameters according to which different subsets of the image collection can be generated.

The IAPR TC-12 Benchmark is now available free of charge and without copyright restrictions.

More details.

Sample annotations:


    <DOC>
    <DOCNO>annotations/01/1000.eng</DOCNO>
    <TITLE>Godchild Cristian Patricio Umaginga Tuaquiza</TITLE>
    <DESCRIPTION>a dark-skinned boy wearing a knitted woolly hat and a light and dark grey striped jumper with a grey zip, leaning on a grey wall;</DESCRIPTION>
    <NOTES></NOTES>
    <LOCATION>Quilotoa, Ecuador</LOCATION>
    <DATE>April 2002</DATE>
    <IMAGE>images/01/1000.jpg</IMAGE>
    <THUMBNAIL>thumbnails/01/1000.jpg</THUMBNAIL>
    </DOC>

image2text
image2text copied to clipboard

Metadata

Intro

Project Overview

Setup

Network Design

Training

Pre-trained Model

Testing and Results

Train on your own

Data Sets

References:

← Metadata

Owner

Metadata

image2text image2text copied to clipboard

Metadata

Intro

Project Overview

Setup

Network Design

Training

Pre-trained Model

Testing and Results

Train on your own

Data Sets

References:

← Metadata

Owner

Metadata

image2text
image2text copied to clipboard