NeuroNLP2
NeuroNLP2 copied to clipboard
Unable to find training data
Dear Max,
Thank you so much for making your code available. I am running your stacked pointer network but I cannot find the train/dev/test datasets
CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=4 python -u parsing.py --mode train --config configs/parsing/stackptr.json --num_epochs 600 --batch_size 32
--opt adam --learning_rate 0.001 --lr_decay 0.999997 --beta1 0.9 --beta2 0.9 --eps 1e-4 --grad_clip 5.0
--loss_type token --warmup_steps 40 --reset 20 --weight_decay 0.0 --unk_replace 0.5 --beam 10
--word_embedding sskip --word_path "data/sskip/sskip.eng.100.gz" --char_embedding random
--punctuation '.' '``' "''" ':' ','
--train "data/PTB3.0/PTB3.0-Stanford_dep/ptb3.0-stanford.auto.cpos.train.conll"
--dev "data/PTB3.0/PTB3.0-Stanford_dep/ptb3.0-stanford.auto.cpos.dev.conll"
--test "data/PTB3.0/PTB3.0-Stanford_dep/ptb3.0-stanford.auto.cpos.test.conll"
--model_path "models/parsing/stackptr/"
Could you tell me where to find the dataset? Thank you!
@XuezheMax I should add that I have the Penn tree bank 3.0 data, but I am not sure how to convert it to the required format. Is there a straightforward way to do that?
I hope this will be helpful to you. https://github.com/clulab/processors/wiki/Converting-from-Penn-Treebank-to-Basic-Stanford-Dependencies https://nlp.stanford.edu/software/stanford-dependencies.shtml