2016
2016 copied to clipboard
AI AI
I have an idea to feed 5000 frames of the movie AI to an image captioning neural net and see what comes out. I think 5000 should give at least 50000 words. I may put in some randomish paragraph breaks and simulate chapters somehow. This might take longer than a month to run though.
Wow that sounds cool. Good idea.
I've ripped the DVD and extracted 5036 frames (0.6 frames per second). I've started a run through the neural net (I'm using https://github.com/karpathy/neuraltalk2), it looks like it will take three or four hours. The output is looking quite unvaried so I may have to intervene a lot to make it not extremely boring.
I saw this demo https://twitter.com/kcimc/status/668094003791929344 of https://github.com/karpathy/neuraltalk2 when it was released last November and thought it'd be great for nanogenmo!
Perhaps segmenting each frame according to motion (by comparing with adjacent frames) prior to running the neural net would help with getting more interesting output.
This would be a lot of work, but I wonder if you could modify the neural net, or maybe just write some extra code to compare two descriptions and focus on the differences between frames, so you could talk more about movement.
It is done.
https://github.com/barnoid/AIAI/blob/master/aiai.pdf
Write up here: https://github.com/barnoid/AIAI
A preview:
A regular view of a nighttime city landscape. A picture of a stately couple in the open distance, and a view of a room with a window and a window. A room with a bed and a table. A view of a building with a large clock on the side of it; a close up of a banana on a table. A stop sign with a sticker on it; a red and white sign with a sky background. A large red stop sign sitting in the center of the street.
A red and white sign with a sky background. Clean cars is seen while strange the brown and green jelly next to them.
I recently saw this, which is a neural network trained specifically to describe what's happening in videos. I haven't tried it out myself, but it seems pretty interesting. I don't know if it's generalized enough to work in something as broad as a movie.