2016 icon indicating copy to clipboard operation
2016 copied to clipboard

AI AI

Open barnoid opened this issue 8 years ago • 8 comments

I have an idea to feed 5000 frames of the movie AI to an image captioning neural net and see what comes out. I think 5000 should give at least 50000 words. I may put in some randomish paragraph breaks and simulate chapters somehow. This might take longer than a month to run though.

barnoid avatar Oct 31 '16 15:10 barnoid

Wow that sounds cool. Good idea.

superMDguy avatar Oct 31 '16 17:10 superMDguy

I've ripped the DVD and extracted 5036 frames (0.6 frames per second). I've started a run through the neural net (I'm using https://github.com/karpathy/neuraltalk2), it looks like it will take three or four hours. The output is looking quite unvaried so I may have to intervene a lot to make it not extremely boring.

barnoid avatar Nov 01 '16 20:11 barnoid

I saw this demo https://twitter.com/kcimc/status/668094003791929344 of https://github.com/karpathy/neuraltalk2 when it was released last November and thought it'd be great for nanogenmo!

hugovk avatar Nov 01 '16 21:11 hugovk

Perhaps segmenting each frame according to motion (by comparing with adjacent frames) prior to running the neural net would help with getting more interesting output.

pointyointment avatar Nov 02 '16 11:11 pointyointment

This would be a lot of work, but I wonder if you could modify the neural net, or maybe just write some extra code to compare two descriptions and focus on the differences between frames, so you could talk more about movement.

superMDguy avatar Nov 02 '16 12:11 superMDguy

It is done.

https://github.com/barnoid/AIAI/blob/master/aiai.pdf

Write up here: https://github.com/barnoid/AIAI

barnoid avatar Nov 30 '16 19:11 barnoid

A preview:

A regular view of a nighttime city landscape. A picture of a stately couple in the open distance, and a view of a room with a window and a window. A room with a bed and a table. A view of a building with a large clock on the side of it; a close up of a banana on a table. A stop sign with a sticker on it; a red and white sign with a sky background. A large red stop sign sitting in the center of the street.

A red and white sign with a sky background. Clean cars is seen while strange the brown and green jelly next to them.

hugovk avatar Nov 30 '16 20:11 hugovk

I recently saw this, which is a neural network trained specifically to describe what's happening in videos. I haven't tried it out myself, but it seems pretty interesting. I don't know if it's generalized enough to work in something as broad as a movie.

superMDguy avatar Dec 16 '16 02:12 superMDguy