DocBank icon indicating copy to clipboard operation
DocBank copied to clipboard

Direct download in browser is unstable. Could you offer a method by which we can download the data from the command line, e.g. wget?

Open johnson-magic opened this issue 3 years ago • 2 comments

Hi:

Thank you for your datasets. Direct download in browser is unstable. Could you offer a method by which we can download the data from the command line, e.g. wget? @liminghao1630 @wolfshow @ranpox

johnson-magic avatar Nov 08 '21 09:11 johnson-magic

wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.001
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.002
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.003
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.004
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.005
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.006
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.007
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.008
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.009
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.010

Since I also had issues with files beeing corrupt I computed the md5-sums for comparison.

MD5 File
c702b68b84642b289b6de7b87bf004eb DocBank_500K_ori_img.zip.001
a2328a17e582db16611483f218f7fac2 DocBank_500K_ori_img.zip.002
f534da5cc25004c79b055f894eeab2d3 DocBank_500K_ori_img.zip.003
a1aeb655366d0b124aee39c50d72592d DocBank_500K_ori_img.zip.004
acfbdc634765985f6916427c0475372d DocBank_500K_ori_img.zip.005
fb99074b46c8046ade9bff4be29718d5 DocBank_500K_ori_img.zip.006
fee26227cb26d684d94c312c35230aea DocBank_500K_ori_img.zip.007
4992adec456221890266980911ff3ebd DocBank_500K_ori_img.zip.008
3eac5561bb51bfef5cf6c9cff293b673 DocBank_500K_ori_img.zip.009
698aabef3ee3c713f2595e1f22ff857b DocBank_500K_ori_img.zip.010

yweweler avatar May 23 '23 06:05 yweweler

wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.001?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.002?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.003?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.004?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.005?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.006?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.007?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.008?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.009?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.010?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D"

The URL needs to be placed inside quotation marks.

zzhanghub avatar Apr 09 '24 06:04 zzhanghub