Jong Wook Kim

Results 86 comments of Jong Wook Kim

@maxrmorrison Thanks for the context! I haven't thought in that way, i.e. interpreting the salience matrix as the observation probability distribution, but it does look like a better way to...

Hi, There can be numerical differences that we cannot fully control, e.g. different CUDA and driver versions, batch sizes, hardware, etc., that may cause the 0.5% difference in evals. That...

The conditional code can be found here in `clip.load()` https://github.com/openai/CLIP/blob/3482bb6ed319f70542094d1ed224c0db0b88c3a5/clip/clip.py#L138-L141 and `clip.load("clip_off_the_shelve.pt")` should work; please let me know if it doesn't.

By `clip_off_the_shelve.pt` I meant the models downloaded under `~/.cache/clip`. Let me know what the stacktrace looks like if you see an error loading those models with `clip.load()`.

The paper reports FLOPs during a forward pass, and we used [fvcore's flop counting tool](https://github.com/facebookresearch/fvcore/blob/master/docs/flop_count.md) to get those numbers. The actual wall time might depend on various factors such as...

Hi, thanks for pointing out some of the details we were cursory or missing; upon investigating, we found that: 1. Facial Emotion Recognition 2013: We noticed an error in the...

Hi, 1. Yes, but we later found that a PyTorch version can work as equivalently on linear probes. 2. Please see https://github.com/openai/CLIP/issues/64#issuecomment-804444364 for more details

"accuracy" during training probably meant the proportion of the training examples that had correctly predicted the contrastive label, e.g.: contrastive_label = torch.arange(batch_size) image_loss = cross_entropy(image_logits, contrastive_label) text_loss = cross_entropy(text_logits, contrastive_label)...

Can you try reshaping the array to (batch_size * num_class, n_ctx) and feed it to the model?

do you have a local directory named `clip` or a file named `clip.py` in the same directory?