filecoin-plus-large-datasets icon indicating copy to clipboard operation
filecoin-plus-large-datasets copied to clipboard

[DataCap Application] DSPA-Asia The National Oceanic and Atmospheric Administration

Open AlanGreaterheat opened this issue 2 years ago • 143 comments
trafficstars

Data Owner Name

DSPA-Asia The National Oceanic and Atmospheric Administration

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

United States

Data Owner Industry

Environment

Website

https://www.noaa.gov/

Social Media

https://twitter.com/NOAA https://www.youtube.com/usnoaagov https://www.facebook.com/NOAA (https://twitter.com/NOAA https://www.youtube.com/usnoaagov https://www.facebook.com/NOAA )

Total amount of DataCap being requested

100 PiB

Expected size of single dataset (one copy)

10 PiB

Number of replicas to store

10

Weekly allocation of DataCap requested

1 PiB

On-chain address for first allocation

f1a6gqgju3wote6qui3v2gxh2gk3gpqlltw4ipq7y

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • [x] Use Custom Multisig

Identifier

efil

Share a brief history of your project and organization

DSPA-Asia is an eco-project established with the support of PL, with the main vision to help Asian storage providers transform their FIL+ business and continuously expand the effective data storage capacity of the Filecoin eco-system. The mission of DSPA-Asia is to help 30 Filecoin CC storage providers create 1EiB QAP for the Filecoin network within 6 months.

DSPA-Asia is willing to contribute to the sustainable development of all mankind by storing data from NOAA as the data owner of public datasets, including weather, satellite, climate, etc.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

n/a

Describe the data being stored onto Filecoin

NOAA High-Resolution Rapid Refresh (HRRR) Model NOAA Global Ensemble Forecast System (GEFS) NOAA Global Forecast System (GFS) NOAA Geostationary Operational Environmental Satellites (GOES) 16 & 17 NOAA Global Ensemble Forecast System (GEFS) Re-forecast NOAA Climate Forecast System (CFS)

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

n/a

How do you plan to prepare the dataset

lotus

If you answered "other/custom tool" in the previous question, enter the details here

n/a

Please share a sample of the data

NOAA High-Resolution Rapid Refresh (HRRR) Model NOAA Global Ensemble Forecast System (GEFS) NOAA Global Forecast System (GFS) NOAA Geostationary Operational Environmental Satellites (GOES) 16 & 17 NOAA Global Ensemble Forecast System (GEFS) Re-forecast NOAA Climate Forecast System (CFS)

Confirm that this is a public dataset that can be retrieved by anyone on the Network

Yes

If you chose not to confirm, what was the reason

n/a

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose SP

Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

n/a

If you already have a list of storage providers to work with, fill out their names and provider IDs below

n/a

How do you plan to make deals to your storage providers

Boost client

If you answered "Others/custom tool" in the previous question, enter the details here

n/a

Can you confirm that you will follow the Fil+ guideline

Yes

Application created via filplus.storage

AlanGreaterheat avatar Jun 01 '23 03:06 AlanGreaterheat

This application requests a total of 100 PiB, so it’s labeled efil+

data-programs avatar Jun 01 '23 03:06 data-programs

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Hello @AlanGreaterheat I'll ask you to fill out the E-Fil Registration Form to help prepare all details for this application. LINK

kevzak avatar Jun 01 '23 13:06 kevzak

Hello @AlanGreaterheat I'll ask you to fill out the E-Fil Registration Form to help prepare all details for this application. LINK

@kevzak The NOAA dataset is a public dataset, E-Fil Registration Form has been completed, please continue to the next step of the process, thank you very much!

AlanGreaterheat avatar Jun 02 '23 08:06 AlanGreaterheat

E-Fil+ Upfront check is complete.

kevzak avatar Jun 02 '23 10:06 kevzak

Datacap Request Trigger

Total DataCap requested

100PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1a6gqgju3wote6qui3v2gxh2gk3gpqlltw4ipq7y

kevzak avatar Jun 02 '23 10:06 kevzak

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1a6gqgju3wote6qui3v2gxh2gk3gpqlltw4ipq7y

DataCap allocation requested

512TiB

Id

38c9153b-3273-452b-bf62-0288bfe32cda

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceck2bxtcuava65vv2fqz6u35ag5vt7ipesvgjshrhz7nntpzlo56i

Address

f1a6gqgju3wote6qui3v2gxh2gk3gpqlltw4ipq7y

Datacap Allocated

512.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

38c9153b-3273-452b-bf62-0288bfe32cda

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceck2bxtcuava65vv2fqz6u35ag5vt7ipesvgjshrhz7nntpzlo56i

laurarenpanda avatar Jun 05 '23 03:06 laurarenpanda

Alan @AlanGreaterheat has talked to me about DAPS before.As an Asian Notary I would like to support this project.

liyunzhi-666 avatar Jun 05 '23 05:06 liyunzhi-666

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecjm6j5to2xronf6di3obumgd2oqxtuk2ujcqu6e56s4htlokkpdi

Address

f1a6gqgju3wote6qui3v2gxh2gk3gpqlltw4ipq7y

Datacap Allocated

512.00TiB

Signer Address

f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq

Id

38c9153b-3273-452b-bf62-0288bfe32cda

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecjm6j5to2xronf6di3obumgd2oqxtuk2ujcqu6e56s4htlokkpdi

liyunzhi-666 avatar Jun 05 '23 05:06 liyunzhi-666

I don't understand why a applicant with bad records was allowed to apply for such large amounts? And this application has already been approved and signed for datacap!!! Here's their historical application. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1068

lilisy90 avatar Jun 07 '23 07:06 lilisy90

There are two significant issues that need to be addressed here.

Firstly, all of the mentioned data sets are already stored on Filecoin multiple times across more than 125 applications. This excessive duplication is completely unacceptable as it diminishes the value of each piece of data. It is crucial to avoid clients repeatedly storing the same data indefinitely. (https://github.com/filecoin-project/filecoin-plus-large-datasets/issues?q=is%3Aissue+NOAA)

Secondly, it has been established and agreed upon by the trust and transparency FIL+ working group that applications are not allowed to merge multiple datasets into one application. In this case, the application combines six separate datasets, namely:

  1. NOAA High-Resolution Rapid Refresh (HRRR) Model
  2. NOAA Global Ensemble Forecast System (GEFS)
  3. NOAA Global Forecast System (GFS)
  4. NOAA Geostationary Operational Environmental Satellites (GOES) 16 & 17
  5. NOAA Global Ensemble Forecast System (GEFS) Re-forecast
  6. NOAA Climate Forecast System (CFS)

Furthermore, the previous datacap allocations granted to a member of Greater Heat (@beck-8) have been largely unsuccessful and remain unresolved. They have stored all data on their own miners without making any progress on distribution, as evidenced here: https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1068. Please let us know if we can help with proper distribution, the application has been idle for many months now.

To address these concerns, I will initiate a dispute and advise against any notary progressing with the signing of this application until we have a meaningful discussion with GreaterHeat. If they require a significant amount of datacap for the DSPA program, we should find a suitable solution, but not through the current approach.

herrehesse avatar Jun 07 '23 09:06 herrehesse

They have stored all data on their own miners without making any progress on distribution, as evidenced here: #1068. Please let us know if we can help with proper distribution, the application has been idle for many months now.

Hi @herrehesse , why did you say that they store on their own miners?

Is it mean that all sps of #1068 belong to Greaterheat? Is there any proof?

lilisy90 avatar Jun 08 '23 06:06 lilisy90

They have stored all data on their own miners without making any progress on distribution, as evidenced here: #1068. Please let us know if we can help with proper distribution, the application has been idle for many months now.

Hi @herrehesse , why did you say that they store on their own miners?

Is it mean that all sps of #1068 belong to Greaterheat? Is there any proof?

@lilisy90 please let me explain Some of herrehesse's information is not clear. 1068 is mine and I have fixed the unevenness. No further distributions have followed. The part here is the node of GH. I am a DP, and I am indeed an employee of GH. I didn't want to store the data in the SP of GH at the beginning, but I didn't find the SP, so I asked GH The nodes sent data, and no one cares about the data on BDE. I admit that in the process of my distribution, the proportion of GH nodes is too large, but it does not exceed the upper limit of one subject, because at the beginning of planning, we wanted to distribute one copy at a time, but later found that this It didn't work, so I changed the strategy in time and distributed a copy of the data to multiple SPs at the same time. But it is true that the uneven specific gravity can be seen from the robot.

beck-8 avatar Jun 08 '23 07:06 beck-8

@herrehesse can you share which nodes are from Greaterheat? I would like to look into it.

Carohere avatar Jun 08 '23 08:06 Carohere

checker:manualTrigger

Sunnyiscoming avatar Jun 10 '23 01:06 Sunnyiscoming

DataCap and CID Checker Report[^1]

No active deals found for this client.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

Thanks for your attention, as I did earlier in #fil-plus-trust-wg with the following message:

We have some ideas below:

  1. as far as we know, merging datasets is not formally prohibited by any working group consensus, depending on whether the notary supports it or not, and at the time of application we have explicitly stated different datasets from the same organization.

  2. According to past LDN analysis, there is no limit to Filplus scope even if the same dataset is stored. filecoin's data storage core issue is to store open and transparent public datasets that are meaningful to humans, which means that even if the same data is stored multiple times, it is still meaningful and truly distributed.

  3. As a DSPA-Asia project, the 100PiB DC applied for is public interest, which is to help PL and the community to prosper together filecoin ecological network, which is beneficial to all parties involved in filecoin. The community should support this non-profit public welfare project.

AlanGreaterheat avatar Jun 15 '23 01:06 AlanGreaterheat

Hello @alangreaterheat - there is now one additional step as part of E-Fil+ application process: To validate your applicant GitHub ID, we ask you to complete the KYC check (a third party ID verification process).

Steps:

  • Go to Filplus.storage
  • Login with your GitHub information
  • You'll see a link to "get verified". Click and complete the steps to verify your identification (avoid using Firefox browser).
  • Once complete, your application will show a "kyc verified" label for all to see.

Also note:

  • The KYC check process is completed by, Togggle
  • During the user data collection process, all gathered information is securely encrypted using the individual user's personal key. This ensures that the data remains confidential and inaccessible, even to Togggle
  • Applicant personal information will never show on GitHub applications, only the kyc verified label.
  • At any time, applicants can request to have verification information removed from the system, let the Governance team know

Let me know if you have any issues or questions.

kevzak avatar Jun 22 '23 04:06 kevzak

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] avatar Jul 03 '23 01:07 github-actions[bot]

你好@AlanGreaterheat- 现在,E-Fil+ 申请流程中增加了一个步骤:为了验证您的申请人 GitHub ID,我们要求您完成 KYC 检查(第三方 ID 验证流程)。

脚步:

  • 转到 Filplus.storage
  • 使用您的 GitHub 信息登录
  • 您将看到“获得验证”的链接。单击并完成验证您身份的步骤(避免使用 Firefox 浏览器)。
  • 完成后,您的申请将显示“kyc 验证”标签供所有人查看。

另请注意:

  • KYC 检查过程由Toggle完成。
  • 在用户数据收集过程中,所有收集的信息都使用单个用户的个人密钥进行安全加密。这可确保数据保持机密且无法访问,即使对于 Toggle
  • 申请人的个人信息永远不会显示在 GitHub 应用程序上,只会显示 kyc 验证的标签。
  • 申请人可以随时请求从系统中删除验证信息,并通知管理团队

如果您有任何问题或疑问,请告诉我。

Sorry for the late reply, DSPA-Asia is recently busy with the preparation and summary of the Hong Kong event, I will finish the KYC by tomorrow at the latest.Thanks very much!

AlanGreaterheat avatar Jul 03 '23 02:07 AlanGreaterheat

你好@AlanGreaterheat- 现在,E-Fil+ 申请流程中增加了一个步骤:为了验证您的申请人 GitHub ID,我们要求您完成 KYC 检查(第三方 ID 验证流程)。

脚步:

  • 转到 Filplus.storage
  • 使用您的 GitHub 信息登录
  • 您将看到“获得验证”的链接。单击并完成验证您身份的步骤(避免使用 Firefox 浏览器)。
  • 完成后,您的申请将显示“kyc 验证”标签供所有人查看。

另请注意:

  • KYC 检查过程由Toggle完成。
  • 在用户数据收集过程中,所有收集的信息都使用单个用户的个人密钥进行安全加密。这可确保数据保持机密且无法访问,即使对于 Toggle
  • 申请人的个人信息永远不会显示在 GitHub 应用程序上,只会显示 kyc 验证的标签。
  • 申请人可以随时请求从系统中删除验证信息,并通知管理团队

如果您有任何问题或疑问,请告诉我。

789 Hi@kevzak I have completed Toggle KYC as requested but it looks like the tags are not showing up on the github app yet, feel free to contact me if you have any questions, thanks!

AlanGreaterheat avatar Jul 04 '23 03:07 AlanGreaterheat

checker:manualTrigger

ndnccs avatar Jul 12 '23 10:07 ndnccs

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

  • Overall Graphsync retrieval success rate: 93.50%
  • Overall HTTP retrieval success rate: 0.00%
  • Overall Bitswap retrieval success rate: 0.00%

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 90% of total datacap - f02129771: 93.08%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1a6gqgju3wote6qui3v2gxh2gk3gpqlltw4ipq7y

DataCap allocation requested

512TiB

Id

e4f5c9e0-7d73-4087-94c5-7c6dd183d197

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1a6gqgju3wote6qui3v2gxh2gk3gpqlltw4ipq7y

Rule to calculate the allocation request amount

100% weekly > 0.5PiB, requesting 0.5PiB

DataCap allocation requested

512TiB

Total DataCap granted for client so far

512TiB

Datacap to be granted to reach the total amount requested by the client (100 PiB)

99.5PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
12271 5 512TiB 26.15 119.23TiB

checker:manualTrigger

sxxfuture-official avatar Jul 20 '23 02:07 sxxfuture-official

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

  • Overall Graphsync retrieval success rate: 90.82%
  • Overall HTTP retrieval success rate: 0.00%
  • Overall Bitswap retrieval success rate: 0.00%

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

Compared with the previous report, the "Storage Provider Distribution" aspect has been improved, and the retrieval rate looks very good. I support this project.

sxxfuture-official avatar Jul 20 '23 02:07 sxxfuture-official

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceacfmzovr77dwl7otybbhqevdra35dyho7sj7mht7wbh3r55b7zfc

Address

f1a6gqgju3wote6qui3v2gxh2gk3gpqlltw4ipq7y

Datacap Allocated

512.00TiB

Signer Address

f1foiomqlmoshpuxm6aie4xysffqezkjnokgwcecq

Id

e4f5c9e0-7d73-4087-94c5-7c6dd183d197

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceacfmzovr77dwl7otybbhqevdra35dyho7sj7mht7wbh3r55b7zfc

sxxfuture-official avatar Jul 20 '23 02:07 sxxfuture-official