Satip icon indicating copy to clipboard operation
Satip copied to clipboard

Add Cloud Masking/Detection Algorithm

Open jacobbieker opened this issue 1 year ago • 14 comments

There is a paper here: https://www.meteoswiss.admin.ch/services-and-publications/publications/scientific-publications/2013/the-heliomont-surface-solar-radiation-processing.html that describes how surface solar radiation is determined for MeteoSwiss, but also includes detecting different types of clouds and creating a cloud mask in SEVIRI imagery.

Detailed Description

Context

This can be quite useful for making our own cloud masks from the raw imagery, or cloud types. The paper also includes an interesting way of correcting for orbital maneuvers of the satellite, to realign the imagery, which might be very helpful.

Possible Implementation

The paper is quite detailed, so possibly just going directly off of that into Satip.

jacobbieker avatar Feb 27 '24 11:02 jacobbieker

Can you assign me this issue and give me a brief on what and how to do so that i can work on this issue? Regards

vikasgrewal16 avatar Feb 29 '24 09:02 vikasgrewal16

Hi, the details are in the paper linked to in that website, they have their approach to cloud masking that should work here. For adding it to Satip, you could add a cloud_mask file that has the cloud masking algorithm implementation, and add some tests that run on the public Zarrs to see how well it works? .

jacobbieker avatar Mar 01 '24 11:03 jacobbieker

i have read those details but when i was bbuilding the project and downloading the data with eumetsat api i have come up with this erroe can you please provide me some information or solution regarding this error.

File "/home/grewal/Satip/venv/lib/python3.10/site-packages/botocore/auth.py", line 418, in add_auth raise NoCredentialsError() botocore.exceptions.NoCredentialsError: Unable to locate credentials

vikasgrewal16 avatar Mar 06 '24 18:03 vikasgrewal16

Can i get your inputs on this issue?

vikasgrewal16 avatar Mar 07 '24 17:03 vikasgrewal16

Hi, sorry for the delay, it seems that you need to log in to AWS for those credentials. That seems like you are using the app.py, which currently does upload to S3 by default. For this, you should be able to use the public google cloud dataset here to get the raw data to use with the cloud masking algorithm.

jacobbieker avatar Mar 12 '24 10:03 jacobbieker

Also, I would recommend focusing on a single issue @vikasgrewal16 if possible? There are quite a few different potential GSoC contributors who are wanting different good first issues. I've seen you also commented on #231, are you more interested in this one or that one? Or a different one?

jacobbieker avatar Mar 12 '24 10:03 jacobbieker

Thank you @jacobbieker for reaching out and bringing up the importance to focus on a single issue. I appreciate your guidance in streamlining the efforts.

Regarding your question on my preferences, I have a keen interest in both GIS and ML, which is why I am actively contributing to this project. My involvement aims to not only learn about open source but also to become a valuable part of the community. GSoC is a means to this end, and I see it as an excellent opportunity to contribute substantively.

As for the specific issues, for now i will be focusing most on #231

Looking forward to your advice and direction.

Best regards, @vikasgrewal16

vikasgrewal16 avatar Mar 12 '24 11:03 vikasgrewal16

Hi @jacobbieker !

I've read the details of the SPARC cloud masking algorithm as mentioned in the reference provided by you. Currently, I'm looking through the properties of the raw data from the shared data bucket and would like to work on the implementation part of the algorithm. I would appreciate it if you could assign this issue to me. Thank you!

Surya-29 avatar Mar 12 '24 17:03 Surya-29

Hi @Surya-29, that sounds great!

jacobbieker avatar Mar 12 '24 17:03 jacobbieker

I'm having some trouble finding attributes necessary for calculating the SPARC score (used for cloud mask). The problem is that these attributes, specifically clear sky/cloud free brightness temperature $T_{cf}$ and background reflectance $\rho_{cf}$ ​, aren't available in the SEVIRI dataset provided. They can either be calculated (Section 6 Clear Sky Compositing 1) or retrieved from other datasets (All Sky Radiances 2) provided by EUMETSAT. Can I go with the latter option since calculating these attributes might involve fitting a model over the diurnal course? However, the issue with accessing the ASR dataset is that it is only available on the EUMETSAT Data Center (which requires us to order it) and not on the Data Store, so downloading via API is not possible right? @jacobbieker How should I approach this now?

Surya-29 avatar Mar 23 '24 08:03 Surya-29

Ah okay, I would have thought that info would have been in the attributes of the native files. Yeah, for a first pass on getting this in, I think getting some data from the data center, and using that is probably the right way to go for now. We can always try to then add calculating the values ourselves later, as the data center can be quite slow to give data. You are right there is no api access to the data center unfortunately. Another, less ideal option, would be to see if we can find an average value, either for the year or per month, that we could use instead? But not sure if there is that published or not somewhere.

jacobbieker avatar Mar 23 '24 08:03 jacobbieker

Yes, I'll probably go with averaging for background reflectance $\rho_{cf}$. As for brightness temperature $T_{cf}$, I would prefer to implement the model mentioned in the paper, if possible, since the final $sparc_{score}$​ requires at least $T_{score}$ to be calculated. Although this aggregate score cloud masking algorithm could compensate for other missing attributes in $sparc_{score}$​ calculation.

Surya-29 avatar Mar 23 '24 16:03 Surya-29

I've made progress on implementing the cloud masking algorithm and have committed the changes to my remote repository ( changes ), should I raise a PR even though the functionality of the code is partial?

  • Where should I add the cloud_mask.py file? Would it be appropriate to create a subpackage in Satip, or do I add it directly under Satip? (Better if we could have it as a subpackage since you've mentioned the possibility of extending the architecture to include other algorithms also,).
  • How do I handle the data? I've been only using the data values (numpy array) for my convenience, but the final result should be xarray.DataArray type including attribute information, etc., just like what you get from EUMETSAT Cloud Mask Dataset right?
  • Lastly, is there a specific Area of Interest? Are we focusing only on the European region?

Surya-29 avatar Mar 31 '24 15:03 Surya-29

Awesome! I would open a PR as a draft PR even if it's incomplete, and just keep adding to it that way.

Yeah, a subpackage would be really good to have.

Yes, the output should be in an Xarray data format, primarily to keep the coordinates and satellite attribute information, you could probably essentially just swap out the data values in the xarray satellite image with the cloud mask data and it would be good to go.

If it is easier, focusing on the European area of interest is fine for now, but we would want to extend it to work over Africa and with the Indian Ocean imagery as well.

jacobbieker avatar Mar 31 '24 15:03 jacobbieker