bran
bran copied to clipboard
Fail to generate the CTD dataset
[lsong10@bhg0031 bran]$ ./extract.sh
Downloading Pubtator dump
--2019-03-31 21:09:22-- ftp://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator/bioconcepts2pubtator_offsets.gz
=> ‘/home/lsong10/ws/exp.dep_forest/bran/data/ctd/bioconcepts2pubtator_offsets.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.13, 2607:f220:41e:250::7
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:21... failed: Connection refused.
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|2607:f220:41e:250::7|:21... failed: Network is unreachable.
Converting data from pubtator to tsv format
usage: process_CDR_data.py [-h] -i INPUT_FILE -d OUTPUT_DIR -f
OUTPUT_FILE_SUFFIX [-s MAX_SEQ] [-a FULL_ABSTRACT]
[-p PUBMED_FILTER] [-r RELATIONS]
[-w WORD_PIECE_CODES] [-t SHARDS]
[-x EXPORT_ALL_EPS] [-n EXPORT_NEGATIVES]
[-e ENCODING] [-m MAX_DISTANCE]
process_CDR_data.py: error: argument -a/--full_abstract: expected one argument
split: extra operand ‘up’
Try 'split --help' for more information.
map relations to smaller set
awk: cmd. line:1: fatal: cannot open file positive_0_genia' for reading (No such file or directory) seperate data into train dev test positive train 50 500 positive dev 50 500 positive test 50 500 negative train 50 500 awk: cmd. line:1: fatal: cannot open file
negative_0_genia' for reading (No such file or directory)
negative dev 50 500
awk: cmd. line:1: fatal: cannot open file negative_0_genia' for reading (No such file or directory) negative test 50 500 awk: cmd. line:1: fatal: cannot open file
negative_0_genia' for reading (No such file or directory)
Sorry for the delayed response. It looks like a network issue caused the download of the initial file to fail: "Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:21... failed: Connection refused.". This is causing all of the subsequent errors to print because each of the following steps require this initial file.