DocBank
DocBank copied to clipboard
Direct download in browser is unstable. Could you offer a method by which we can download the data from the command line, e.g. wget?
Hi:
Thank you for your datasets. Direct download in browser is unstable. Could you offer a method by which we can download the data from the command line, e.g. wget? @liminghao1630 @wolfshow @ranpox
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.001
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.002
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.003
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.004
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.005
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.006
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.007
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.008
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.009
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.010
Since I also had issues with files beeing corrupt I computed the md5-sums for comparison.
MD5 | File |
---|---|
c702b68b84642b289b6de7b87bf004eb | DocBank_500K_ori_img.zip.001 |
a2328a17e582db16611483f218f7fac2 | DocBank_500K_ori_img.zip.002 |
f534da5cc25004c79b055f894eeab2d3 | DocBank_500K_ori_img.zip.003 |
a1aeb655366d0b124aee39c50d72592d | DocBank_500K_ori_img.zip.004 |
acfbdc634765985f6916427c0475372d | DocBank_500K_ori_img.zip.005 |
fb99074b46c8046ade9bff4be29718d5 | DocBank_500K_ori_img.zip.006 |
fee26227cb26d684d94c312c35230aea | DocBank_500K_ori_img.zip.007 |
4992adec456221890266980911ff3ebd | DocBank_500K_ori_img.zip.008 |
3eac5561bb51bfef5cf6c9cff293b673 | DocBank_500K_ori_img.zip.009 |
698aabef3ee3c713f2595e1f22ff857b | DocBank_500K_ori_img.zip.010 |
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.001?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.002?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.003?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.004?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.005?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.006?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.007?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.008?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.009?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" &&
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.010?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D"
The URL needs to be placed inside quotation marks.