deeptype icon indicating copy to clipboard operation
deeptype copied to clipboard

alpha in equation and how to select parameters in training type classifier

Open yifding opened this issue 4 years ago • 1 comments

Hi Jonathan, I have trained a model with generated sample by: python3 extraction/produce_windowed_h5_tsv.py /data/datasets/wikipedia/en_train.tsv /data/datasets/wikipedia/en_train.h5 /data/datasets/wikipedia/en_dev.h5 --window_size 10 --validation_start 1000000 --total_size 200500000 and python3 learning/train_type.py my_config.json --cudnn --fused --hidden_sizes 200 200 --batch_size 256 --max_epochs 10000 --name TypeClassifier --weight_noise 1e-6 --save_dir my_great_model --anneal_rate 0.9999 --device cpu --faux_cudnn.

I test the ambiguration on the blog example(with only split): The man saw a Jaguar speed on the high way. The prey saw the jaguar cross the jungle.

The ranking score is based on #15 only considering the type classifier. The result I get is : The man saw a Jaguar speed on the high way. Without type: Jaguar Cars: 0.61 Jaguar 0.29 SEPECAT Jaguar 0.019 With type: Jaguar Cars: 0.67 Jaguar 0.31 SEPECAT Jaguar 0.020

The prey saw the jaguar cross the jungle. Without type: Jaguar Cars: 0.61 Jaguar 0.29 SEPECAT Jaguar 0.019 With type: Jaguar Cars: 0.67 Jaguar 0.31 SEPECAT Jaguar 0.021

Compared to the post, probabilities without type are very close to the report. The probabilities with type are little off. I don't know whether it comes from the underfitting of the classifier model or I pick the wrong hyper parameters.

yifding avatar Aug 13 '19 15:08 yifding

Training is certainly stochastic / dependent on seed so I think you'll have some variation on the exact probabilities you obtain. The alpha term is obtained by fitting it using the output of the model on a heldout set, e.g.:

$$\alpha^* = \text{argmax}{\alpha} = \sum{i}^n \mathrm{Prob}(\mathrm{label}_i, \mathrm{TypeProbs}(\mathrm{sent}_i, \alpha))$$

Where alpha is the weight given for the other class. I don't think it matters how you solve for alpha, you can use gradient descent, or np.linalg.solve since it's a linear term in the equation above. (perhaps a smarter solution would be to predict alpha based on the sentence so that you can use a context-specific alpha).

JonathanRaiman avatar Jun 23 '22 22:06 JonathanRaiman