filecoin-plus-large-datasets icon indicating copy to clipboard operation
filecoin-plus-large-datasets copied to clipboard

[DataCap Application] Speedium - NIH NCBI Sequence Read Archive [NON FIL-E]

Open cryptowhizzard opened this issue 1 year ago • 58 comments

Data Owner Name

NIH - National Institute of Health

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.nih.gov/

Social Media

https://www.facebook.com/nih.gov/

Total amount of DataCap being requested

15PiB

Expected size of single dataset (one copy)

13.14 PiB

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1k22mdjvcqhuvrflxrftp4f5oyvgzvnt6rr3kyyi

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • [ ] Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

Since its launch, the Filecoin network has become an important player in the decentralised storage space, offering a secure and transparent alternative to traditional data storage solutions.

We as Speedium / DCENT have been engaged with storing real and valuable datasets on the Filecoin network since Slingshot 2.6 and have been actively developing tools to improve the process. We are always on the lookout for new and useful client data to onboard.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

NIH NCBI Sequence Read Archive (SRA) on AWS
The Sequence Read Archive (SRA), produced by the [National Center for Biotechnology Information (NCBI)](https://www.ncbi.nlm.nih.gov/) at the [National Library of Medicine (NLM)](http://nlm.nih.gov/) at the [National Institutes of Health (NIH)](http://www.nih.gov/), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (City and Country)

Our datacenter is situated in Heerhugowaard, a town located just north of Amsterdam in the Netherlands.

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

The dataset's information originates from Amazon AWS, and it is processed using the Singularity software. This specialized tool is designed to cater to large-scale clients who require onboarding of PB-scale data onto the Filecoin network. We have allocated a substantial pool capacity of 13 PiB to securely store a complete copy of this dataset until it is fully distributed. Once the distribution is complete, we will provide hot copies to entities involved in repairing or restoring the dataset.

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

This dataset has not been previously stored; we are the pioneering entity attempting to do so.

Please share a sample of the data

https://registry.opendata.aws/ncbi-sra/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • [X] I confirm

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Monthly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

| MinerID | City | Continent | Business/Entity |
| --- | --- | --- | --- |
| `f01944347` | Oregon | USA | Jenny, Dabai |
| `f01952350` | Oregon | USA | Jenny, Dabai |
| `f01972364` | Oregon | USA | Jenny, Dabai |
| `f01972376` | Oregon | USA | Jenny, Dabai |
| `f02000937` | Chengdu | CN | MTY |
| `f01915033` | Chengdu | CN | MTY |
| `f0120****` | Melbourne | AU | HOLON |
| `f0115****` | Melbourne | AU | HOLON |
| `f01199430` | Heerhugowaard | EU | DCENT |
| `f01786387` | Heerhugowaard | EU | DCENT |
| `f01201327` | Heerhugowaard | EU | DCENT |
| `f01937642` | Heerhugowaard | EU | DCENT |
| `f0198****` | Dallas | USA | GREATERHEAT |
| `f0188****` | Singapore | AS | GREATERHEAT |
| `f01091851` | Omaha | USA | DLTx |
| `f01736668` | Omaha | USA | DLTx |
| `f01820744` | Omaha | USA | DLTx |
| `f0855584` | Omaha | USA | DLTx |
| `f01794610` | Omaha | USA | DLTx |
| `f01838599` | Kansas City | USA | DLTx |
| `f01845552` | Kansas City | USA | DLTx |
| `f01274011` | Sydney | AU | DSS |
| `f01746964` | Sydney | AU | DSS |

How do you plan to make deals to your storage providers

Boost client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

cryptowhizzard avatar Dec 16 '23 01:12 cryptowhizzard

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

KYC

This user’s identity has been verified through filplus.storage

data-programs avatar Dec 16 '23 01:12 data-programs

Datacap Request Trigger

Total DataCap requested

15PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1k22mdjvcqhuvrflxrftp4f5oyvgzvnt6rr3kyyi

Sunnyiscoming avatar Dec 17 '23 12:12 Sunnyiscoming

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1k22mdjvcqhuvrflxrftp4f5oyvgzvnt6rr3kyyi

DataCap allocation requested

512TiB

Id

e4687e48-5600-40be-b036-9f27b456877f

SP List provided: [{"providerID":"f01944347","City":"Oregon","Country":"USA","SPOrg","Jenny,Dabai"}, {"providerID":"f01952350","City":"Oregon","Country":"USA","SPOrg","Jenny,Dabai"}, {"providerID":"f01972364","City":"Oregon","Country":"USA","SPOrg","Jenny,Dabai"}, {"providerID":"f01972376","City":"Oregon","Country":"USA","SPOrg","Jenny,Dabai"}, {"providerID":"f02000937","City":"Chengdu","Country":"CN","SPOrg","MTY"}, {"providerID":"f01915033","City":"Chengdu","Country":"CN","SPOrg","MTY"}, {"providerID":"f0120****","City":"Melbourne","Country":"AU","SPOrg","HOLON"}, {"providerID":"f0115****","City":"Melbourne","Country":"AU","SPOrg","HOLON"}, {"providerID":"f01199430","City":"Heerhugowaard","Country":"EU","SPOrg","DCENT"}, {"providerID":"f01786387","City":"Heerhugowaard","Country":"EU","SPOrg","DCENT"}, {"providerID":"f01201327","City":"Heerhugowaard","Country":"EU","SPOrg","DCENT"}, {"providerID":"f01937642","City":"Heerhugowaard","Country":"EU","SPOrg","DCENT"}, {"providerID":"f0198****","City":"Dallas","Country":"USA","SPOrg","GREATERHEAT"}, {"providerID":"f0188****","City":"Singapore","Country":"AS","SPOrg","GREATERHEAT"}, {"providerID":"f01091851","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01736668","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01820744","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f0855584","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01794610","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01838599","City":"KansasCity","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01845552","City":"KansasCity","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01274011","City":"Sydney","Country":"AU","SPOrg","DSS"}, {"providerID":"f01746964","City":"Sydney","Country":"AU","SPOrg","DSS"},]

Sunnyiscoming avatar Dec 17 '23 12:12 Sunnyiscoming

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebwoze5uuxnwpcq6a6sakkp3yq4jf6zzuac3oq4fpeilozyvgdir4

Address

f1k22mdjvcqhuvrflxrftp4f5oyvgzvnt6rr3kyyi

Datacap Allocated

512.00TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

Id

e4687e48-5600-40be-b036-9f27b456877f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebwoze5uuxnwpcq6a6sakkp3yq4jf6zzuac3oq4fpeilozyvgdir4

psh0691 avatar Dec 21 '23 16:12 psh0691

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedof5mue44xzh2fgoi7rga7363v424vtlirtchl4v5fz7x5dudjwq

Address

f1k22mdjvcqhuvrflxrftp4f5oyvgzvnt6rr3kyyi

Datacap Allocated

512.00TiB

Signer Address

f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

Id

e4687e48-5600-40be-b036-9f27b456877f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedof5mue44xzh2fgoi7rga7363v424vtlirtchl4v5fz7x5dudjwq

mjroddy avatar Dec 21 '23 23:12 mjroddy

checker:manualTrigger

herrehesse avatar Dec 24 '23 12:12 herrehesse

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

checker:manualTrigger

herrehesse avatar Dec 25 '23 09:12 herrehesse

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

checker:manualTrigger

herrehesse avatar Dec 28 '23 19:12 herrehesse

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 99.94% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

1: There is no backup of the data, all of which are independent data and do not comply with the rules

2: LDN #2111 with CID shared for the same category

3: The packaging report of #2111 is very non compliant, and there are serious issues with cid sharing and insufficient number of copies

Tom-OriginStorage avatar Dec 29 '23 09:12 Tom-OriginStorage

https://github.com/data-preservation-programs/filplus-checker-assets/blob/main/filecoin-project/filecoin-plus-large-datasets/issues/2111/1701851208265.md

Tom-OriginStorage avatar Dec 29 '23 09:12 Tom-OriginStorage

image

Tom-OriginStorage avatar Dec 29 '23 09:12 Tom-OriginStorage

@Sunnyiscoming @galen-mcandrew @Kevin-FF-USA @Filplus-govteam
Please check this LDN according to the standards and close it

Tom-OriginStorage avatar Dec 29 '23 09:12 Tom-OriginStorage

SP List provided: [{"providerID":"f01944347","City":"Oregon","Country":"USA","SPOrg","Jenny,Dabai"}, {"providerID":"f01952350","City":"Oregon","Country":"USA","SPOrg","Jenny,Dabai"}, {"providerID":"f01972364","City":"Oregon","Country":"USA","SPOrg","Jenny,Dabai"}, {"providerID":"f01972376","City":"Oregon","Country":"USA","SPOrg","Jenny,Dabai"}, {"providerID":"f02000937","City":"Chengdu","Country":"CN","SPOrg","MTY"}, {"providerID":"f01915033","City":"Chengdu","Country":"CN","SPOrg","MTY"}, {"providerID":"f0120****","City":"Melbourne","Country":"AU","SPOrg","HOLON"}, {"providerID":"f0115****","City":"Melbourne","Country":"AU","SPOrg","HOLON"}, {"providerID":"f01199430","City":"Heerhugowaard","Country":"EU","SPOrg","DCENT"}, {"providerID":"f01786387","City":"Heerhugowaard","Country":"EU","SPOrg","DCENT"}, {"providerID":"f01201327","City":"Heerhugowaard","Country":"EU","SPOrg","DCENT"}, {"providerID":"f01937642","City":"Heerhugowaard","Country":"EU","SPOrg","DCENT"}, {"providerID":"f0198****","City":"Dallas","Country":"USA","SPOrg","GREATERHEAT"}, {"providerID":"f0188****","City":"Singapore","Country":"AS","SPOrg","GREATERHEAT"}, {"providerID":"f01091851","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01736668","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01820744","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f0855584","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01794610","City":"Omaha","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01838599","City":"KansasCity","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01845552","City":"KansasCity","Country":"USA","SPOrg","DLTx"}, {"providerID":"f01274011","City":"Sydney","Country":"AU","SPOrg","DSS"}, {"providerID":"f01746964","City":"Sydney","Country":"AU","SPOrg","DSS"},]

image

@Sunnyiscoming @galen-mcandrew @Kevin-FF-USA @Filplus-govteam

The encapsulated node does not match the node provided in the application. Please close it

Tom-OriginStorage avatar Dec 29 '23 09:12 Tom-OriginStorage

Hello,

Please note that we have multiple issues open with the same wallet address for this LDN, mostly caused by technical difficulties with the bot and on request of the governance team we created a new one.

You also need to take the other issues in account for this project with the other wallet to get the whole picture.

Thanks

cryptowhizzard avatar Dec 29 '23 23:12 cryptowhizzard

The LDN shared by cid is #2111 #339 . There is another one that cannot be clicked. These three LDNs are simply different projects and different addresses. Don’t quibble, it is a fact.

Tom-OriginStorage avatar Dec 30 '23 16:12 Tom-OriginStorage

The LDN shared by cid is #2111 #339 . There is another one that cannot be clicked. These three LDNs are simply different projects and different addresses. Don’t quibble, it is a fact.

It seems that you have difficulty understanding given that you are biased and probably emotional because things did not work out as expected.

However we have only one NIH project, a 15 PiB public and retrievable dataset and we have an old and a new wallet, the latter on request of the Fil+ gov team. They both belong to one project and there is nothing wrong.

Thanks

cryptowhizzard avatar Dec 30 '23 17:12 cryptowhizzard

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1k22mdjvcqhuvrflxrftp4f5oyvgzvnt6rr3kyyi

DataCap allocation requested

512TiB

Id

ef0efb76-6081-4585-b562-a5f73c4517f9

The LDN shared by cid is #2111 #339 . There is another one that cannot be clicked. These three LDNs are simply different projects and different addresses. Don’t quibble, it is a fact.

It seems that you have difficulty understanding given that you are biased and probably emotional because things did not work out as expected.

However we have only one NIH project, a 15 PiB public and retrievable dataset and we have an old and a new wallet, the latter on request of the Fil+ gov team. They both belong to one project and there is nothing wrong.

Thanks

Hello, I don’t have any bias. The question is very clear. You did not answer the question directly. #339 is not the same address and is not the same project as #339. Don't blame others for being biased because you can't answer your own mistakes. But you don't know that you have always been biased to attack others.

@herrehesse You can check it according to your standards

Tom-OriginStorage avatar Jan 02 '24 04:01 Tom-OriginStorage

The encapsulated node does not match the node provided in the application. Please close it

@Tom-OriginStorage this is low, even for you as a known abuser of this program. As @cryptowhizzard stated above, its all known and clear. Nothing to add.

herrehesse avatar Jan 05 '24 09:01 herrehesse

@herrehesse

1: There is only one copy of data for 3P, and only two copies of data for 700T+, which is a large proportion;

2: Why do you maliciously attack others' projects without any evidence just by your so-called speculation when you claim to have minor violations for the projects your company has applied for.

Tom-OriginStorage avatar Jan 08 '24 05:01 Tom-OriginStorage

Hello,

These issues all belong to this wallet : f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues?q=is%3Aissue+f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

cryptowhizzard avatar Jan 08 '24 14:01 cryptowhizzard

checker:manualTrigger

cryptowhizzard avatar Jan 08 '24 14:01 cryptowhizzard

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

@Tom-OriginStorage

Like i said, these wallet should be combined. We are storing the biggest dataset in the network at the moment ( 15PiB). It is inevitable that the distribution is not in pair and it will take some time for everyone to catch up. It's not a set of a few TiB's that can be done easily.

cryptowhizzard avatar Jan 08 '24 14:01 cryptowhizzard

No need to argue, the fact is there, and I have already made it very clear that everyone can see it

Tom-OriginStorage avatar Jan 09 '24 08:01 Tom-OriginStorage