shot-type-classifier icon indicating copy to clipboard operation
shot-type-classifier copied to clipboard

Model usage in the wild and other observations

Open thoppe opened this issue 4 years ago • 2 comments

Thanks again for putting this together. After combing through about a dozen movies, I can say that the models works brilliantly on almost every scene. It struggles with action scenes, but that's an understandable failure, as the shot itself is sometimes dynamic. Amazing work!

I've been experimenting with watching only specific types of shots and seeing how this would alter the expression or dynamic of the film. I've put together a fun video of only the non-speaking medium close shots here

https://www.youtube.com/watch?v=K0_O34eoC68&feature=youtu.be

BEFRAME is an AI-powered project to ONLY keep the scenes where the character is framed and not speaking. They can just "be" in the frame. Explore the characters and director's choices from Legally Blonde, The Exorcist, Fight Club, Pitch Perfect, Die Hard, Pretty Woman, The Princess Bride, and Requiem For a Dream. Each movie is first clipped by visual content, and then analyzed for shot type. Only the Medium Close-up (MCU) shots are preserved. Google's speech detection is used to filter out any shots with detected words. Finally, the shots are strung back together in sequence.

I think there's a lot more one could do with this! If you've got any feedback let me know, otherwise feel free to close this issue as it's just a comment.

thoppe avatar Oct 18 '19 15:10 thoppe

@thoppe this is really interesting! The video is quite fun to watch, and a nice direction to explore. Perhaps as you add different filtering criteria -- CUs with dialogues, WS only, etc, you'll start to see some more interesting patterns.

I saw you used PySceneDetect to split the movie up. Interestingly, I've also looked into cut detection techniques, and PySceneDetect definitely seemed like the best option to start with. It's a great tool, and gets the job done fairly well, but ICYMI, you should be aware that it isn't perfect. I opened #96 discussing this.

There are deep learning approaches that work better, but aren't in a user friendly format yet. I intend to fill this gap soon, adapting the solution for cinema. You might find these interesting:

  1. Fast Video Shot Transition Localization with Deep Structured Models. Paper. Code.
  2. Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks. Paper.

Also, this.


I think there's a lot more one could do with this!

I agree! In addition to exploring a similar approach with some different filtering criteria mentioned above, the next step could be combining other tools along with this. For example, looking at all the shots of a particular character, and how does this vary based on the character's role in the film. Also, looking at usage of colour schemes and how these vary, etc. Does that make sense?

I did some work with color (my first coding project ever), but haven't released it yet. I'll share it here once I do.

I'll leave this issue open for discussion of potential directions we could take. I'm curious, what do you plan to do next with this? :)

For some inspiration: http://cinemetrics.fredericbrodbeck.de

rsomani95 avatar Oct 19 '19 11:10 rsomani95

Thanks again for putting this together. After combing through about a dozen movies, I can say that the models works brilliantly on almost every scene. It struggles with action scenes, but that's an understandable failure, as the shot itself is sometimes dynamic. Amazing work!

I've been experimenting with watching only specific types of shots and seeing how this would alter the expression or dynamic of the film. I've put together a fun video of only the non-speaking medium close shots here

https://www.youtube.com/watch?v=K0_O34eoC68&feature=youtu.be

BEFRAME is an AI-powered project to ONLY keep the scenes where the character is framed and not speaking. They can just "be" in the frame. Explore the characters and director's choices from Legally Blonde, The Exorcist, Fight Club, Pitch Perfect, Die Hard, Pretty Woman, The Princess Bride, and Requiem For a Dream. Each movie is first clipped by visual content, and then analyzed for shot type. Only the Medium Close-up (MCU) shots are preserved. Google's speech detection is used to filter out any shots with detected words. Finally, the shots are strung back together in sequence.

I think there's a lot more one could do with this! If you've got any feedback let me know, otherwise feel free to close this issue as it's just a comment.

Hey,can you share this model ? the author doesn't reply.

1933874502 avatar Nov 09 '22 12:11 1933874502