create_cisTarget_databases
create_cisTarget_databases copied to clipboard
The 'zsync' files of databases file might be incorrect.
I'm sorry for submitting an issue here. I tried to download these databases using zsync.
https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
Pay attention to the SHA-1 checksum.
$ sha1sum hg38_screen_v10_clust.regions_vs_motifs.scores.feather
57b58cbc57002e2b96f4b51d6a9fec0e831abd29 hg38_screen_v10_clust.regions_vs_motifs.scores.feather
$ wget https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync
--2024-05-09 09:16:55-- https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync
Resolving resources.aertslab.org (resources.aertslab.org)... 198.18.0.18
Connecting to resources.aertslab.org (resources.aertslab.org)|198.18.0.18|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61006451 (58M)
Saving to: ‘hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync’
hg38_screen_v10_clust.reg 100%[===================================>] 58.18M 11.1MB/s in 7.2s
2024-05-09 09:17:03 (8.11 MB/s) - ‘hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync’ saved [61006451/61006451]
$ head hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync
Blocksize: 2048
Filename: hg38_screen_v10_clust.regions_vs_motifs.scores.feather
Hash-Lengths: 2,3,6
Length: 13882267648
MTime: Thu, 07 Jul 2022 14:31:02 +0000
SHA-1: 57b58cbc57002e2b96f4b51d6a9fec0e831abd29
URL: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
zsync: 2.0.0-alpha-1
��d�� W3�����VVGO�m��
$ wget https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt
--2024-05-09 09:25:49-- https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt
Resolving resources.aertslab.org (resources.aertslab.org)... 198.18.0.18
Connecting to resources.aertslab.org (resources.aertslab.org)|198.18.0.18|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 97 [text/plain]
Saving to: ‘hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt’
hg38_screen_v10_clust.reg 100%[===================================>] 97 --.-KB/s in 0s
2024-05-09 09:25:50 (76.9 MB/s) - ‘hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt’ saved [97/97]
$ cat hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt
07b5e527d2ed082e081e439e68dffa77b5f6129c hg38_screen_v10_clust.regions_vs_motifs.scores.feather
As you can see, its SHA-1 value matches the one recorded in the 'zsync' file's header, but differs from the one recorded in 'sha1sum.txt'.
I hope it's not my fault, as redownloading is a bit of a hassle.
ranking database downloaded
$ cat hg38_screen_v10_clust.regions_vs_motifs.rankings.feather.sha1sum.txt
1688a925f22d312769798258d990f13866bb4924 hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
$ head hg38_screen_v10_clust.regions_vs_motifs.rankings.feather.zsync
Blocksize: 2048
Filename: hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
Hash-Lengths: 2,3,6
Length: 35192956928
MTime: Thu, 07 Jul 2022 14:35:59 +0000
SHA-1: 95c823ee1e19f68ce0c82f79042cdc1007018ddb
URL: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
zsync: 2.0.0-alpha-1
�W�inX�H1�ƤM)�s3���␦
�.�t�4��eDb�D��>�P�_�����C�е�C�G�o�e����t=�r��?i����i���X{�^�O#�5�L��څq�Kr��D�!S9�ۢ�I}����w� �{3�U^�u��3L���������D4��.>5c)�4a�B��r�ZD�C��_����˃����a�"��2#v/��[D�Z���,�
$ sha1sum hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
95c823ee1e19f68ce0c82f79042cdc1007018ddb hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
An error occurred :
ValueError: "/m/tutor/database/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather" is not a cisTarget Feather database in Feather v1 or v2 format.
ctxcore/ctdb.py :
......
def is_feather_v1_or_v2(feather_filename: Union[Path, str]) -> Optional[int]:
"""
Check if the passed filename is a Feather v1 or v2 file.
:param feather_filename: Feather v1 or v2 filename.
:return: 1 (for Feather version 1), 2 (for Feather version 2) or None.
"""
with open(feather_filename, "rb") as fh_feather:
# Read first 6 and last 6 bytes to see if we have a Feather v2 file.
fh_feather.seek(0, 0)
feather_v2_magic_bytes_header = fh_feather.read(6)
fh_feather.seek(-6, 2)
feather_v2_magic_bytes_footer = fh_feather.read(6)
if feather_v2_magic_bytes_header == feather_v2_magic_bytes_footer == b"ARROW1":
# Feather v2 file.
return 2
# Read first 4 and last 4 bytes to see if we have a Feather v1 file.
feather_v1_magic_bytes_header = feather_v2_magic_bytes_header[0:4]
feather_v1_magic_bytes_footer = feather_v2_magic_bytes_footer[2:]
if feather_v1_magic_bytes_header == feather_v1_magic_bytes_footer == b"FEA1":
# Feather v1 file.
return 1
# Some other file format.
return None
......
$ head -c 6 hg38_screen_v10_clust.regions_vs_motifs.*
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
ARROW1
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
ARROW1
$ tail -c 6 hg38_screen_v10_clust.regions_vs_motifs.*
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
��
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
00176-
The file size is incorrect.
$ stat hg38_screen_v10_clust.regions_vs_motifs.*.feather
File: hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
Size: 35192956928 Blocks: 68736272 IO Block: 4096 regular file
Device: 807h/2055d Inode: 18643438 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1001/ charles) Gid: ( 1001/ charles)
Access: 2024-05-09 10:10:29.183467890 +0800
Modify: 2022-07-07 14:35:59.000000000 +0800
Change: 2024-05-09 10:10:03.311709805 +0800
Birth: 2024-05-08 21:57:40.146629410 +0800
File: hg38_screen_v10_clust.regions_vs_motifs.scores.feather
Size: 13882267648 Blocks: 27113824 IO Block: 4096 regular file
Device: 807h/2055d Inode: 18643440 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1001/ charles) Gid: ( 1001/ charles)
Access: 2024-05-09 10:48:38.146833263 +0800
Modify: 2024-05-08 23:28:39.283831255 +0800
Change: 2024-05-09 10:10:03.311709805 +0800
Birth: 2024-05-08 21:57:43.862621727 +0800
$ curl -I https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
HTTP/1.1 200 OK
Date: Thu, 09 May 2024 03:52:22 GMT
Server: Apache/2.4.29 (Ubuntu)
Strict-Transport-Security: max-age=15768000
Last-Modified: Thu, 07 Jul 2022 14:35:59 GMT
ETag: "831a9eca2-5e338010f31c0"
Accept-Ranges: bytes
Content-Length: 35192958114
X-Frame-Options: sameorigin
$ curl -I https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
HTTP/1.1 200 OK
Date: Thu, 09 May 2024 03:56:51 GMT
Server: Apache/2.4.29 (Ubuntu)
Strict-Transport-Security: max-age=15768000
Last-Modified: Thu, 07 Jul 2022 14:31:02 GMT
ETag: "33b729822-5e337ef5b5580"
Accept-Ranges: bytes
Content-Length: 13882267682
X-Frame-Options: sameorigin
So the ’zsync‘ files is incorrect.
I fixed it using ‘curl -C -’
$ curl -C - -O https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather** Resuming transfer from byte position 35192956928
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:--100 1186 100 1186 0 0 989 0 0:00:01 0:00:01 --:--:-- 989
$ curl -C - -O https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
** Resuming transfer from byte position 13882267648
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--100 34 100 34 0 0 29 0 0:00:01 0:00:01 --:--:-- 29
$ tail -c 6 hg38_screen_v10_clust.regions_vs_motifs.*feather
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
ARROW1
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
ARROW1
It looks like it's working now.
To summarize,
the 'zsync' files are incorrect
Best wishes
zsync files are removed for now as zsync was having issues with big files (larger than 2G) for a long time.
Looks like the zsync2 bug: https://github.com/AppImageCommunity/zsync2/issues/31 might finally be resolved in a fork of zsync2: https://github.com/NiLuJe/zsync2/commit/a8e2d68e3f03315835f6d6fb9f74a26c3ea000b9