filecoin-plus-large-datasets icon indicating copy to clipboard operation
filecoin-plus-large-datasets copied to clipboard

[DataCap Application] <Edwardext> - <SDSS Dataset >

Open edwardext opened this issue 2 years ago • 77 comments

Data Owner Name

Sloan Digital Sky Survey

Data Owner Country/Region

United States

Data Owner Industry

Environment

Website

https://www.sdss.org

Social Media

https://twitter.com/sdssurveys
https://www.youtube.com/user/sdssurveys

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

600TiB

On-chain address for first allocation

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • [ ] Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

I am a researcher and developer in blockchain technology.In 2020, I participated in the Filecoin Space Race as a member of the technical support team, providing technical assistance to other service providers (SPs).       
I am deeply interested in Filecoin's complex storage system and have a deep understanding of mining technology, as well as extensive industry experience.Through system optimization, hardware matching optimization, and process optimization, I have developed a mining system based on Lotus that is suitable for large-scale production.
I have a team of three developers in China, and we are currently developing an ecosystem application based on FVM.
Based on my recognition of Filecoin's valid data, I would like to provide the public dataset as valid data storage and recommend it to SPs that I know or work with.
I will supervise and guide my collaborating SPs to strictly follow the valid data  encapsulating rules established by the community.

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

issue 1827/1841

Describe the data being stored onto Filecoin

The SDSS (Sloan Digital Sky Survey) dataset is a publicly available astronomical observation dataset of the night sky.    It has been obtained over the years by the Sloan Foundation Telescope located in New Mexico, USA.
The SDSS dataset includes data of more than 500 million astronomical objects, such as stars, galaxies, quasars, and asteroids, from five survey phases and was about 407T in size.  These data include information such as the positions, brightness, and colors of celestial objects in the sky.    The dataset also includes spectra of about 4 million objects, which can provide information about their chemical composition, distance, and velocity.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

the SDSS data set is stored in multiple locations: AWS\SDSS Science Archive Server (SAS)
It can be publicly accessed and downloaded from the sdss website

How do you plan to prepare the dataset

others/custom tool

If you answered "other/custom tool" in the previous question, enter the details here

Self-developed tool

Please share a sample of the data

https://www.sdss4.org/dr17
https://www.sdss.org/dr18/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • [X] I confirm

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, Africa, North America, South America, Europe, Australia (continent), Antarctica

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives, Others

How do you plan to choose storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

f01980952	CN
f01943941	CN
f0150816	CN
f02031264	SGP
f02052244	US
f02052252	US

How do you plan to make deals to your storage providers

Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

edwardext avatar Apr 03 '23 09:04 edwardext

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

600TiB

Client address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

Sunnyiscoming avatar Apr 03 '23 12:04 Sunnyiscoming

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

DataCap allocation requested

256TiB

Id

6c2a5269-259f-455b-80d5-016f4e36f8cc

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

DataCap allocation requested

256TiB

Id

d7568554-f3d3-4ef6-b98e-f2f9b841c7e9

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

Does this "https://www.sdss.org" require permission to download, what data do you plan to download, can you give an explanation?

a1991car avatar Apr 04 '23 03:04 a1991car

Hi @a1991car, thank you for your attention and questions. All the data provided for download on SDSS are publicly available.

sdss

The data submitted this time involves celestial data and spectrum.

edwardext avatar Apr 04 '23 03:04 edwardext

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedroq2zsrstbazejdwbrvr2d3vemvtdhgum7lks3ih4mwuer26ra2

Address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

Datacap Allocated

256.00TiB

Signer Address

f1qnumecdypgrbaebtkdfjnwt5ndacadcuas3deiq

Id

d7568554-f3d3-4ef6-b98e-f2f9b841c7e9

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedroq2zsrstbazejdwbrvr2d3vemvtdhgum7lks3ih4mwuer26ra2

a1991car avatar Apr 04 '23 10:04 a1991car

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceb63qubp4pgw64cfpob6be4tudays7f642gbmzuaqn6oehxzcacui

Address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

Datacap Allocated

256.00TiB

Signer Address

f16karfxq7lxdy7izqrzrk75jf3not34k6sg6zvcy

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb63qubp4pgw64cfpob6be4tudays7f642gbmzuaqn6oehxzcacui

NewHuoPool avatar Apr 07 '23 07:04 NewHuoPool

checker:manualTrigger

cryptowhizzard avatar Apr 12 '23 12:04 cryptowhizzard

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

@NewHuoPool

Can you let us (the community) know what due diligence has been done here for this client? What is the data onboarding plan for this client? Where is he/she going to store the data and who is she? What is their internet capacity? On what bases have you approved this application and why?

From our own experience i know that downloading this dataset is a pain. I wonder what data is going to be stored, where and when. Can you enlighten us @edwardext

cryptowhizzard avatar Apr 12 '23 12:04 cryptowhizzard

Since the project application, we have been downloading this dataset until now. From the official website of SDSS, you can learn that this dataset has gone through five stages and 18 data releases. The data we plan to store for this project exceeds 700T, and we plan to store five copies. Based on the redundancy encapsulated , we applied for a 5P DC. We will download the dataset in five stages as it takes time for downloading and transmission. Finally, we will send the data to each SP via hard disk. We expect to start the encapsulating at the earliest by the end of April, and the latest in May.

edwardext avatar Apr 17 '23 07:04 edwardext

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

DataCap allocation requested

512TiB

Id

0edafe77-d669-46ad-a72d-90256c925f20

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

Rule to calculate the allocation request amount

10% of total dc amount requested

DataCap allocation requested

512TiB

Total DataCap granted for client so far

256TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.75PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
null null 256TiB null 44.87TiB

checker:manualTrigger

DaYouGroup avatar May 18 '23 13:05 DaYouGroup

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebfstxu4dedrp7omhhcdv6qpt6twwask6kpwd6aongnn6o4wwc6no

Address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

Datacap Allocated

512.00TiB

Signer Address

f1nwjsd2mc6hu4qrwnmd6ukrfkuu4h5fhs7u3exii

Id

0edafe77-d669-46ad-a72d-90256c925f20

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebfstxu4dedrp7omhhcdv6qpt6twwask6kpwd6aongnn6o4wwc6no

DaYouGroup avatar May 18 '23 13:05 DaYouGroup

checker:manualTrigger

newwebgroup avatar May 19 '23 04:05 newwebgroup

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

The results of CID Checker look good because it is only the first round and the data is only distributed among 3 SPs, I hope to see more SPs subsequently. ⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

A few randomly selected CIDs were retrieved and verified by the CID Tool tool, and all were successful. https://filecoin.tools/f02114878 image image

newwebgroup avatar May 19 '23 07:05 newwebgroup

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceadebdkrnuoo7uhack2r5apzjiiilhezf3cwltjs4ismslh4w6kxk

Address

f15dqakgac2j2keky2up6oz2qidxhm3fssqnbghoy

Datacap Allocated

512.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

0edafe77-d669-46ad-a72d-90256c925f20

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceadebdkrnuoo7uhack2r5apzjiiilhezf3cwltjs4ismslh4w6kxk

newwebgroup avatar May 19 '23 07:05 newwebgroup

checker:manualTrigger

edwardext avatar May 25 '23 02:05 edwardext

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

checker:manualTrigger

edwardext avatar May 29 '23 02:05 edwardext