hover_net README of weights folder contains error

The following lines of the README are, I suppose, erroneous :

hovernet_original_kumar_notype_pytorch.tar

Checkpoint trained using 'fast' model mode on Kumar dataset. This model does not perform classification. Checkpoint obtained by directly training using the PyTorch supported library.

hovernet_original_kumar_notype_tf2pytorch.tar

Checkpoint trained using 'fast' model mode on Kumar dataset. This model does not perform classification. Checkpoint converted from model trained on TensorFlow to PyTorch compatible version.

In the name of the files it says original but in the description it says fast. After testing myself, these are effectively original weights so the description should be change.

Thanks.

Aug 30 '22 10:08 bguetarni

Apologies, I forget to specify, it's the README of the google drive folder that contains the weights : here

Aug 30 '22 10:08 bguetarni

Thanks for your reply. I also found weights that trained on ConSep which the inference result was good. But I can't achieve the same good results after following the setting on the github. At the same time, I tried to train the code on the bigger dataset Lizard(this is the main problem) and also got similar results. May you give me some advice? I made the train on NVIDIA 2080TI, nr_epoches = 50, all of the batch_size are 4, and I did't change any other settings except type_info.json.

Aug 31 '22 08:08 Netter99

Thanks for your reply. I also found weights that trained on ConSep which the inference result was good. But I can't achieve the same good results after following the setting on the github. At the same time, I tried to train the code on the bigger dataset Lizard(this is the main problem) and also got similar results. May you give me some advice? I made the train on NVIDIA 2080TI, nr_epoches = 50, all of the batch_size are 4, and I did't change any other settings except type_info.json.

Hi Ntmac. Looking at the original publication HoVerNet paper 2019, your training method is different to theirs.

HoVerNet training was conducted over two lots of 50 epochs with training only of decoder weights in the first 50 epochs and then training of all weights in the second 50 epochs. A stepped learning rate was used in both sets of 50 epochs (LR = 10^4 for 25 epochs then LR = 10^5 for 25 epochs). Quote from the methods section of the original paper:

Regarding HoVer-Net, we initialised the model with pre-trained weights on the ImageNet dataset (Deng et al., 2009), trained only the decoders for the first 50 epochs, and then fine-tuned all layers for another 50 epochs. We train stage one for around 120 minutes and stage two for around 260 min. Therefore, the overall training time is around 380 min. Stage two takes longer to train because unfreezing the encoder utilises more memory and therefore a smaller batch size needs to be used. Specifically, we used a batch size of 8 and 4 on each GPU for stage one and two respectively. We used Adam optimisation with an initial learning rate of 10^4 and then reduced it to a rate of 10^5 after 25 epochs. This strategy was repeated for fine-tuning.

Were you to use the same starting weights and same technique as above, it is likely your results will be somewhat more similar to the attached weights (there will be slight, random differences either way).

Finally, if comparing to the published performance (I know you aren't here but if the question arises in future), the original models were trained in TensorFlow and this repo is PyTorch. The team have provided comparisons of metrics here PyTorch vs TensorFlow HoVerNet to give an idea of performance differences to be expected by using the TensorFlow DL framework instead.

Hope this helps,

Volodymyr

Sep 23 '22 08:09 VolodymyrChapman