Jong Wook Kim comments

Results 86 comments of


Jong Wook Kim

Bug in Viterbi decoding?

@maxrmorrison Thanks for the context! I haven't thought in that way, i.e. interpreting the salience matrix as the observation probability distribution, but it does look like a better way to...

Zeroshot evaluation CIFAR100 in Colab: Results ~0.5% below values reported in paper

Hi, There can be numerical differences that we cannot fully control, e.g. different CUDA and driver versions, batch sizes, hardware, etc., that may cause the 0.5% difference in evals. That...

clip.model.build_model does not work if device is cpu

The conditional code can be found here in `clip.load()` https://github.com/openai/CLIP/blob/3482bb6ed319f70542094d1ed224c0db0b88c3a5/clip/clip.py#L138-L141 and `clip.load("clip_off_the_shelve.pt")` should work; please let me know if it doesn't.

clip.model.build_model does not work if device is cpu

By `clip_off_the_shelve.pt` I meant the models downloaded under `~/.cache/clip`. Let me know what the stacktrace looks like if you see an error loading those models with `clip.load()`.

About the unexpected inference latency.

The paper reports FLOPs during a forward pass, and we used [fvcore's flop counting tool](https://github.com/facebookresearch/fvcore/blob/master/docs/flop_count.md) to get those numbers. The actual wall time might depend on various factors such as...

public datasets for evaluation

Hi, thanks for pointing out some of the details we were cursory or missing; upon investigating, we found that: 1. Facial Emotion Recognition 2013: We noticed an error in the...

Linear prob evaluation on ImageNet

Hi, 1. Yes, but we later found that a PyTorch version can work as equivalently on linear probes. 2. Please see https://github.com/openai/CLIP/issues/64#issuecomment-804444364 for more details

How can I get accuracy metrics when training?

"accuracy" during training probably meant the proportion of the training examples that had correctly predicted the contrastive label, e.g.: contrastive_label = torch.arange(batch_size) image_loss = cross_entropy(image_logits, contrastive_label) text_loss = cross_entropy(text_logits, contrastive_label)...

having trouble when doing batch inference

Can you try reshaping the array to (batch_size * num_class, n_ctx) and feed it to the model?

AttributeError: module 'clip' has no attribute 'load'

do you have a local directory named `clip` or a file named `clip.py` in the same directory?