2018 icon indicating copy to clipboard operation
2018 copied to clipboard

I Forced an AI to Watch Santa Claus Conquers the Martians

Open zachwhalen opened this issue 5 years ago • 9 comments

I have three ideas this year, and I'm making separate issues for each. This is one of them.

A little while ago I made a twitter bot out of the text you get when you ask Microsoft Cognitive services to describe what it sees in an image. There are some limitations to that API, but I bet it could work on a series of images to eventually produce 50K words of text. The trick will be figuring out a meaningful series of images.

It might work to feed it a graphic novel one panel at a time, but I don't think the AI works very well with drawn images. I feel like it's just going to say "a picture of a drawing" every time.

I have yet to try it, though, so maybe that will work or maybe I'll need to think of something else.

zachwhalen avatar Nov 01 '18 15:11 zachwhalen

I tried it out on some panels of Watchmen because that's what I had on hand. The results are better than I expected, but probably not good enough:

"A person jumping in the air"

"A close up of a person"

"A close up of a piece of paper"

"A close up of a book"

It seems panels with a lot of text are going to be challenging, and Watchmen is a very wordy comic.

zachwhalen avatar Nov 01 '18 17:11 zachwhalen

That's quite good, if you feed it all panels from a whole comic, replace "person" with a fleshed our character, it may be interesting

LuRsT avatar Nov 01 '18 21:11 LuRsT

I shifted this a little bit, or at least I want to try a different source for my imagery. What follows is the result of passing frames of a video through Microsoft's image description API. Paragraph breaks happen when the API failed to produce a caption.

A star in the middle of the night sky. A star in the middle of the night sky. A star in the middle of the night sky. A star in the dark sky. A star in the middle of the night sky. A star in the middle of the night sky. A star in the middle of the night sky. A person in a dark sky. A close up of a tower.

A person in a dark room. A person in a dark room. A sign in the dark.

A close up of a computer. A screenshot of a video game. A close up of a computer. A view of a city at night. A close up of some water.

A close up of smoke.

A screenshot of a video game.

A glass display case. An aerial view of a city.

A view of a city at night. A view of a city at night. A view of a city. A view of a city at night. A view of a city at night. An aerial view of a city. A close up of a weapon. A close up of a weapon.

An aerial view of a city.

A close up of a weapon. A view of a city. A close up of a light. A blurry image of a person. A blurry image of a truck. A blurry photo of a fire. A blurry photo of a fire. A blurry photo of a fire. A blurry image of a street. A blurry image of a boat. A view of a city at night.

A close up of an engine. A close up of a person. A close up of an engine. A large room. A blurry image of a person. A close up of a cake. A close up of an engine. A close up of an engine. A close up of a light. A close up of an engine. A close up of an engine. A blurry image of a person.

A blurry photo of a tree.

A close up of a fire. A blurry image of a river.

A view of a plane.

A bird flying in the sky. A view of a plane. A close up of a brick wall. A close up of a plane. A blurry image of a plane. A sky view looking up at night. A star in the sky.

A close up of an engine.

A close up of a persons face.

A black and blue sky in the background. Fireworks in the night sky. A person with a sunset in the background. A close up of a plane in the sky. A tower with a mountain in the background. A view of a mountain. A view of a city at sunset. A view of a city at sunset. A large ship in the water. A close up of a light house at sunset.

A traffic light with a building in the background. A side view mirror of a car. A side view mirror of a car. A person standing next to a window. A blurry photo of a computer. A person in a dark room. A clock sitting in the dark. A man wearing glasses. A close up of a man. A blurry photo of a bus. A close up of a person. A very dark water. A very dark water. A bed covered in snow. A close up of a snow covered slope. A close up of a light. A close up of a computer. A close up of a snow covered slope. Smoke coming out of it. Smoke coming from it. A close up of smoke. Smoke coming from it. A train on a track with smoke coming out of it. Smoke coming out of the water. A close up of smoke. A very dark water. A group of people looking at a computer. A group of people in a room. A man wearing a suit and tie. A person sitting at a table in front of a mirror. A person sitting at a table in front of a mirror. A person sitting at a table in front of a mirror. A person sitting in front of a mirror posing for the camera. A man standing in front of a mirror posing for the camera. A woman standing in front of a mirror posing for the camera. A close up of a persons face. A close up of a computer. A close up of a computer.

A person standing in a room. A person sitting at a table in a room. A person standing in a room. A person standing in a room.

A view of a room. A man standing in a room. A blurry photo of a building. A person standing in front of a car. A person standing in front of a car. A person standing in front of a car. A blurry image of a car. A man wearing a suit and tie. A car driving on a city street. A group of people walking on the side of a building. A group of people walking down a street next to a tree. A person sitting in a dark room. A man standing in front of a mirror posing for the camera. A living room filled with furniture and a fire place. A person standing in front of a mirror.

John F. Kennedy et al. standing in front of a window. A man standing in front of a window. A man and a woman standing in front of a window. A man in a suit standing in front of a crowd. A man wearing a military uniform. A man wearing a military uniform. A group of people sitting at a table. A group of people standing around a table. A group of people sitting at a table. A group of people standing next to a person. A group of people looking at a phone. A man wearing a suit and tie. A man wearing a suit and tie. A man wearing a suit and tie. A man wearing a suit and tie. A group of people standing next to a person in a suit and tie. A group of people sitting at a table. A man flying through the sky. A group of people in a field. A man that is standing in the snow. A close up of an engine. A close up of an engine.

A close up of a fence. A close up of a metal object. A person wearing a suit and tie. A person standing in front of a mirror posing for the camera.

A fire pit with smoke coming out of it. A blurry photo of a fire. A view of a fireplace. A close up of a fire. A close up of a beverage.

A close up of food. A group of people posing for the camera. A man and a woman standing in a room. A person holding a phone.

A man carrying a suitcase. A man with smoke coming out of it. A person with smoke coming out of it. A man with smoke coming out of it. A group of people posing for the camera. A man wearing a suit and tie. A man wearing a suit and tie. Walter cronkite wearing a suit and tie. Smoke coming from it. A close up of smoke.

A cup of water.

Walter Cronkite wearing a suit and tie. A man wearing a suit and tie smiling at the camera. A star in the dark. A close up of a device. A close up of an eye. A blurry image of a person. A man in a dark sky with marfa lights in the background. A man sitting at a table. A man sitting at a table.

A train covered in snow. A close up of a snow covered mountain. A view of the earth from space. A group of people sitting at a desk. A group of people in a room. A helicopter flying in the sky. A close up of a person.

A close up of an engine. A close up of an engine. A close up of a bicycle. A close up of a barrel. A close up of a glass. A close up of a black background.

A group of people standing in front of a crowd. A group of people standing in front of a crowd. A close up of a persons face.

A blurry image of a man. A person looking at the camera. A close up of a screen. A close up of a light. A person sitting on a table.

A group of people sitting at a desk.

A man wearing a suit and tie. A man wearing a suit and tie. A man and a woman taking a selfie. A group of people sitting at a table.

A close up of a bird.

A close up of a person. Water next to the ocean. Water next to the ocean. A man wearing a suit and tie. A man wearing a suit and tie. A man wearing a suit and tie. A man wearing a suit and tie. A man wearing a suit and tie.

A blurry image of a person. A blurry image of a man.

Water next to the ocean. A close up of a person. A man wearing a suit and tie.

A person in a dark room. A man standing in a room. A person standing in front of a mirror posing for the camera. A man standing in a room. A man sitting at a desk. A person sitting at a table using a laptop.

A close up of a device. A man and a woman standing in a room. A man and a woman standing in a room. A man and a woman standing in a room. A man and a woman standing in a room. A man standing in front of a television. A man standing in front of a television. A man wearing glasses. A man wearing glasses. A person standing in front of a mirror. A man standing in front of a crowd. A group of people looking at each other. A man in a suit and tie. A man sitting at a table. A group of people in a room. A man talking on a cell phone. A man wearing a suit and tie. A man wearing a suit and tie.

A person wearing a costume. A person wearing a costume. A close up of a man looking at the camera. A blurry close up of a person. A man standing in front of a mirror. A man sitting at a table. A man sitting at a table. A man sitting at a table.

A person with a helmet on.

A person wearing a costume. A group of people posing for the camera.

A close up of a cake. A person wearing a suit and tie. A man wearing a suit and tie.

A messy room.

A pile of snow. A group of snow covered ground. A car covered in snow.

A pile of snow. A close up of a snow covered ground. A close up of a black background. A large waterfall.

A close up of a vehicle. A view of the earth from space.

A close up of an animal.

A close up of a person.

A close up of a fire. A close up of a fire. A close up of an engine.

A close up of an engine. A close up of a fire.

A close up of a helmet.

A man looking at the camera. A man looking at the camera. A man standing in front of a mirror posing for the camera. A man standing in front of a mirror posing for the camera. A man standing in front of a mirror posing for the camera. A man standing in front of a mirror posing for the camera.

A person in a dark room.

A swimming pool.

A close up of an animal. A glass of water. A close up of a motorcycle. A close up of a motorcycle. A close up of a motorcycle. A close up of a motorcycle engine. A close up of an engine. A close up of an engine. A close up of a lobster. A lobster on a table. A close up of an animal. A close up of a motorcycle. A close up of an engine. A swimming pool. A close up of an engine. A close up of an engine. A close up of an engine. A reflection of a mirror posing for the camera. A close up of a car. A close up of a car. A car engine. A man sitting in front of a mirror posing for the camera. A man wearing glasses and looking at the camera. A man wearing glasses and smiling at the camera. A man drinking from a glass. A man drinking from a glass. A person wearing a mask.

A person looking at the camera. A close up of a person.

Water next to the ocean.

Water next to the ocean. Water next to the ocean.

A group of people in a room. A group of people in a room. A person holding a microphone. A man holding a gun. A group of people in a room. A group of people in a room. A group of people looking at a screen. A man standing in front of a television. A person standing in front of a television. A person standing in front of a television. A person standing in front of a television. A person standing in front of a television. A person standing in front of a television. A man standing in front of a television. A man standing in front of a television. A screen shot of a man. A group of people looking at a screen.

Smoke coming from it. A close up of a logo. A large body of water. A body of water. A man riding a wave on top of a body of water. A blurry image of a person. A man driving a car. A person taking a selfie. A blurry close up of a man. A group of people in a room. A man standing next to a body of water. A person standing next to a body of water. A group of people on a boat in the water. A close up of a toy.

A person wearing a costume.

A man looking at the camera. A blurry image of a person.

A close up of a person. A close up of a busy city street. A group of people posing for the camera. A group of people standing in front of a crowd. A view of a city at night.

A group of people in a room. A close up of an engine. A close up of an engine. A close up of an engine.

A close up of a statue. A close up of a black bag. A car engine. A close up of a car. A close up of a car. A close up of a car. A close up of a car. A close up of a car.

A close up of an engine.

A close up of an engine.

A close up of an engine. A blurry image of a person.

A group of people in a dark room. A desk with a computer keyboard.

zachwhalen avatar Nov 12 '18 16:11 zachwhalen

I'm considering:

  • instead of printing repeating phrases, printing it once but make it bigger according to the number of repetitions.
  • figuring out where the scene breaks are (maybe with i-frame detection) and call those chapters
  • Making some compound sentences out of these to give it some sense of flow.
  • Somehow using each frames' dominant colors in the printed text

zachwhalen avatar Nov 12 '18 16:11 zachwhalen

There is a flow of narrative but the abstract nature cries out for specifics, however disjointed. Is there a way to feed each return from the image description API into something else to get returns that fill in-between? The OuLiPo had a technique called 'larding' in which a writer inserts a new sentence between 2 existing sentences, then inserts between these until achieving the desired word-count.

jlee50 avatar Nov 13 '18 20:11 jlee50

I like that output!

The first paragraph looks very repetitive, but not every sentence is actually the same. I kind of like that. Maybe just deduplicate and enlarge sentences that are actually identical, and when it alternates between two similar sentences, leave that alone for effect? (Actually, I guess that would be the natural outcome of simple deduplication—it would be more difficult to not do that.)

On the other hand, something that annoyed me as a grammar enthusiast (and doesn't seem to be your fault, because it's just the text the service gives you) is that the descriptions always say "A close up of…", when it should be "A closeup of…". (Chrome's grammar checker wants to change "closeup" to "close up" when I type it, too. I think Chrome is teaching people bad grammar. :angry:)

pointyointment avatar Nov 14 '18 16:11 pointyointment

I'm calling it done. I may tweak it some more, but for the record I definitely had a 50k+ PDF before midnight. I just had one last bug that took a few minutes to figure out.

I wanted to do this process on a different movie -- the sample above is from one of the Transformers movies -- but I didn't get around to finding a good digital copy of the full length film. Instead, I went with something quicker, easier, and out of copyright: Santa Claus Conquers the Martians.

I extract 14,635 frames, then fed them to Microsoft's Cognitive Services for automatically-generated captions. Then, I just added some syntax to help it flow a little better and created a more book-like layout.

Here's the code.

Here's the novel.

This is less ambitious than my other ideas, but I'm pretty happy with it. Unlike those other ideas, I actually got this one finished!

zachwhalen avatar Dec 01 '18 05:12 zachwhalen

:)

I guess that's just a person followed by a red curtain. That's definitely a red wall. I see a sign just before a red wall. That's definitely a sign. Or a red wall and a sign? There's a red wall. A sign with a red wall. I see a sign. I see a red wall; a sign. It's sort of like a red curtain. It's sort of like a red wall and a sign. Now it's a red wall. I guess that's just a red background and a red wall. I see a red and white sign. A sign; a red background. I see a red wall. There's a red background just before a red wall. There's a red background. A red curtain and a red background. Or a red and black text. A red wall. I see a red and black text just before a red wall. I see a red background. It's sort of like a red wall; a red background? Now it's a red wall. I see text on a red background followed by a red wall. I guess that's just a red background. Or a red wall. Or a red and black text followed by text on a red background. That's definitely a red and black text. It's sort of like a red wall and a red background. A red wall. I see a red and black text. There's a red wall. I see a red and black text just before a red wall. A red and black text. I see a red curtain and a curtain? That's definitely a logo. I see a piece of paper just before a logo. A piece of paper. A logo. Or a device. There's an umbrella. I guess that's just a logo.

I guess that's just a logo.

I see a logo.

I guess that's just a logo.

I see a logo.

A device. I see a logo. There's a device.

Now it's a device. A logo.

It's sort of like a device. There's a building. A computer. I guess that's just a device.

Or a device. Now it's a logo. I see a sky filled with water

hugovk avatar Dec 01 '18 14:12 hugovk

First output was like Gertrude Stein improving Don DeLillo. Second is more straight Beckett.

I assembled a 5-page poem a while back solely out of captions from a stock photo site and it reads a lot like #1.

rchrdlln avatar Dec 02 '18 05:12 rchrdlln