CLIP4Clip
CLIP4Clip copied to clipboard
What do Pair, L, T stand for in the code?
Hi, I'm a beginner and would like to ask a question. What do Pair, L, T stand for in the code? What do they mean?
# Pair x L x T x 3 x H x W
video = np.zeros((len(s), self.max_frames, 1,
3, self.rawVideoExtractor.size, self.rawVideoExtractor.size), dtype=np.float)
As far as I understand:
- Pair: number of samples (videos)
- L: length of video in frames
- T: time axis? It's always 1 rendering it practically unnecessary...
- 3: RGB channels
- H: height
- W: width
Happy to hear thoughts of other members who have been using this repo!
As far as I understand:
- Pair: number of samples (videos)
- L: length of video in frames
- T: time axis? It's always 1 rendering it practically unnecessary...
- 3: RGB channels
- H: height
- W: width
Happy to hear thoughts of other members who have been using this repo!
T is a scalar representing a set of frames extracted from the same one-second-interval. With an fps of 30, each group of 30 frames will share the same T value. I guess that this will play a role in the frame extraction strategy.
As far as I understand:
- Pair: number of samples (videos)
- L: length of video in frames
- T: time axis? It's always 1 rendering it practically unnecessary...
- 3: RGB channels
- H: height
- W: width
Happy to hear thoughts of other members who have been using this repo!
thank you very much!