Xiongkun Linghu

Results 10 issues of Xiongkun Linghu

I run the default scripts, however, 100 generated samples take about 15 hours, is there any method to accelarate the process?

I have tried FCN pyramid structure 5,2,1 in the paper, but the accuracy on miniImagenenet is still 65.5%. Besides, acc on tiered imagenet is 70.6%(2% lower), 72.8%(2% lower) on CIFAR-FS,...

I pretrained the model and then used deep emd with the default setting, but the 5-way-1-shot accuracy was just 65.5%, which was 1% lower than the paper.

I use the edl loss to train in mini-imagenet dataset with 64 classes, but the loss can't converge and the accuracy is very low.

I want to caculate the perceptual distance for the specific input images, could you please provide the detailed impletation of this part?

The work is interesting. I want to train my model with your datasets. Could you please provide more detailed description of the datasets used in the Table 1 in the...

I want to generate samples interpolated by 2 images, is it covered by the codebase?

Thanks for sharing the work. I notice that the model can output coordinates of the 3D bounding boxes throught numerical values. How to access this data related to 3D grounding...

Thanks for your insteresting work. I visualize the grounded scene caption data and notice there is a key called 'all_phrases_positions'. What does it mean? I guess the numerical values represent...

I notice referent tokens are interleaved in the output. Can multiple referent tokens appear in a single text prompt, such as "Describe the table and the chair ."?