examples icon indicating copy to clipboard operation
examples copied to clipboard

Modernize carbon flux

Open Azaya89 opened this issue 1 year ago • 2 comments

Modernizing an example checklist

Preliminary checks

  • [x] Look for open PRs and issues that reference the project you are updating. It is possible previous unmerged work in PR could be re-used to modernize the project. Comment on these PRs and issues when appropriate, hopefully we should be able to close some of them after your modernizing work.

Change ‘anaconda-project.yml’ to use the latest workable version of packages

  • [x] Pin python=3.11
  • [x] Remove the upper pin (e.g. hvplot<0.9 to hvplot, panel>=0.12,<1.0 to panel>=0.12) of all other dependencies. Removing the upper pins of dependencies could necessitate code revisions in the notebooks to address any errors encountered in the updated environment. Should complexities or extensive time requirements arise, document issues for team discussion on whether to re-pin specific packages or explore other solutions.
  • [x] Add/update the lower pin of all other dependencies (e.g. hvplot to hvplot>=0.9.2, hvplot>=0.8 to hvplot>=0.9.2). Usually, the new/updated lower pin of a dependency will be the version resolved after anaconda prepare has been run. Execute !conda list in a notebook, or anaconda run conda list in the terminal, to display the version of each dependency installed in the environment. Adjusting the lower pin helps ensure that the locks produced for each platform (linux-64, win-64, osx-64, osx-arm64) rely on the tested dependencies and not on some older versions.
  • [x] If one of the channels include conda-forge or pyviz, ask Maxime if it can be removed

Plot API updates (discussed on a per-example basis)

  • [x] Generally, try to replace HoloViews usage with hvPlot. At a certain point of complexity, such as with the use of ‘.select’, it might be better to stick with HoloViews. Additional examples of ‘complexity boundaries’ should be documented in this document.
  • [x] Almost always, try to replace the use of datashade with rasterize (read this page). Essentially, rasterize allows Bokeh to handle the colormapping instead of Datashader.

Interactivity API updates (discussed on a per-example basis)

  • [x] Remove all pn.interact usage
  • [x] Avoid .param.watch() usage. This is pretty low-level and verbose approach and should not be used in Examples unless required, or an Example is specifically trying to demo its usage in an advanced workflow.
  • [x] Prefer using pn.bind(). Read this page for explanation.
  • [x] For apps built using a class approach, when they create a view() method and call it directly, update the class by inheriting from pn.viewable.Viewer and replace view() by __panel__(). Here is an example.

Panel App updates (discussed on a per-example basis)

  • [x] If the project doesn’t at any point create a Panel app at all, consider creating one. It can be as simple as wrapping a plot in pn.Column, or more complicated to incorporate widgets, etc. Make the final app .servable().
  • [x] If the project creates an app in a notebook but doesn’t deploy it (i.e. there is no command: dashboard declaration in the anaconda-project.yml file), try adding it.
  • [x] If the project already deploys an app but doesn’t wrap it in a nice template, consider wrapping it in a template.
  • [x] If the project deploys an app wrapped in a template, customize the template a little so all the apps don’t look similar (e.g. change the header background color). This doesn’t need to be discussed.
  • [x] Comment start If you are building the application in a single cell, you can construct a template explicitly, like template = pn.template.BootstrampTemplate, but if building up an app across multiple cells, it is probably cleaner to declare the template at the top with pn.extension(template='bootstrap'). See how to guide on setting a template.

General code quality updates

  • [x] If the notebook disables warnings (e.g. with warnings.simplefilter(‘ignore’) somewhere at the start of the notebook, remove this line. Try to update the code to remove the warnings, if any. If updating the code to remove the warnings is taking significant amount of time and effort, bring it up for discussion and we may decide to disable warnings again.

Text content

  • [x] Edit the text content anywhere and everywhere that it can be improved for clarity.
  • [x] Check the links are valid, and update old links (e.g. http -> https, xyz.pyviz.org -> xyz.holoviz.org)
  • [x] Remove instructions to install packages inside an example

Visual appearance - Example

  • [x] Check that the titles/headings make sense and are succinct.
  • [x] Check that the text content blocks are easily readable; revise into additional paragraphs if needed.
  • [x] Check that the code blocks are easily readable; revise as needed. (e.g. add spaces after commas in a list if there are none, wrap long lines, etc.)
  • [x] Check image and plot sizes. If possible, making them responsive is highly recommended.
  • [x] Check the appearance on a smartphone (check Google to see how to adapt the appearance of your browser to display pages as if they were seen from a smartphone, this is usually done via the web developer tools). This is not a top priority for all examples, but if there are a few easy and straightforward changes to make that can improve the experience, let’s do it.
  • [x] Check the updated notebook with the original notebook

Visual appearance - Gallery

  • [x] Check the thumbnail is visually appealing
  • [x] Check the project title is well formatted (e.g. Ml Annotators to ML Annotators), if not, add/update the examples_config.title field in anaconda-project.yml
  • [x] Check the project description is appropriate, if not, update the description field in anaconda-project.yml

Workflow (after you have made the changes above)

  • [x] Run successfully doit validate:<projectname>
  • [x] Run successfully doit test:<projectname>
  • [x] Run successfully doit doc_one –name <projectname>. It’s better if the project notebook(s) is saved with its outputs (but be sure to clear outputs before committing to the examples repo!) when building the docs. Then open this file in your browser ./builtdocs/index.html and check how the site looks.
  • [x] If you’re happy with all the above, open a PR. Reminder, clear notebook outputs before pushing to the PR.

Azaya89 avatar Aug 06 '24 20:08 Azaya89

This is still a WIP. Not ready for review yet.

Azaya89 avatar Aug 06 '24 20:08 Azaya89

Bug Report on this example notebook: Inconsistency with the usage of intake

These are the current issues preventing the complete modernization of this notebook:

  1. Version Compatibility: Although it is recommended to pin intake to <2, only version 0.6.2 runs without errors. For example, executing metadata = cat.fluxnet_metadata().read() results in the following traceback error with other versions:
Traceback
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 metadata = cat.fluxnet_metadata().read()
      2 metadata.sample(5)

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:190, in CSVSource.read(self)
    186     return self._dask_df.compute()
    188 import pandas as pd
--> 190 self._get_schema()
    191 return pd.concat([self._get_partition(i) for i in range(len(self.files()))])

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:142, in CSVSource._get_schema(self)
    140 nrows = self._csv_kwargs.get("nrows")
    141 self._csv_kwargs["nrows"] = 10
--> 142 df = self._get_partition(0)
    143 if nrows is None:
    144     del self._csv_kwargs["nrows"]

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:160, in CSVSource._get_partition(self, i)
    157     return self._dask_df.get_partition(i).compute()
    159 url_part = self.files()[i]
--> 160 return self._read_pandas(url_part, i)

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:166, in CSVSource._read_pandas(self, url_part, i)
    163 import pandas as pd
    165 if self.pattern is None:
--> 166     return pd.read_csv(url_part, storage_options=self._storage_options, **self._csv_kwargs)
    168 drop_path_column = "include_path_column" not in self._csv_kwargs
    169 path_column = self._path_column()

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
   1013 kwds_defaults = _refine_defaults_read(
   1014     dialect,
   1015     delimiter,
   (...)
   1022     dtype_backend=dtype_backend,
   1023 )
   1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
    617 _validate_names(kwds.get("names", None))
    619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
    622 if chunksize or iterator:
    623     return parser

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds)
   1617     self.options["has_index_names"] = kwds["has_index_names"]
   1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine)
   1878     if "b" not in mode:
   1879         mode += "b"
-> 1880 self.handles = get_handle(
   1881     f,
   1882     mode,
   1883     encoding=self.options.get("encoding", None),
   1884     compression=self.options.get("compression", None),
   1885     memory_map=self.options.get("memory_map", False),
   1886     is_text=is_text,
   1887     errors=self.options.get("encoding_errors", "strict"),
   1888     storage_options=self.options.get("storage_options", None),
   1889 )
   1890 assert self.handles is not None
   1891 f = self.handles.handle

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/common.py:728, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    725     codecs.lookup_error(errors)
    727 # open URLs
--> 728 ioargs = _get_filepath_or_buffer(
    729     path_or_buf,
    730     encoding=encoding,
    731     compression=compression,
    732     mode=mode,
    733     storage_options=storage_options,
    734 )
    736 handle = ioargs.filepath_or_buffer
    737 handles: list[BaseBuffer]

File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/common.py:453, in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    445     return IOArgs(
    446         filepath_or_buffer=file_obj,
    447         encoding=encoding,
   (...)
    450         mode=fsspec_mode,
    451     )
    452 elif storage_options:
--> 453     raise ValueError(
    454         "storage_options passed with file object or non-fsspec file path"
    455     )
    457 if isinstance(filepath_or_buffer, (str, bytes, mmap.mmap)):
    458     return IOArgs(
    459         filepath_or_buffer=_expand_user(filepath_or_buffer),
    460         encoding=encoding,
   (...)
    463         mode=mode,
    464     )

ValueError: storage_options passed with file object or non-fsspec file path

Pinning intake=0.6.2 resolves this issue without any traceback errors.

  1. Inconsistency in File Downloads: The cell responsible for downloading the full fluxnet files shows inconsistent behavior:
s3 = S3FileSystem(anon=True)
s3_paths = s3.glob('earth-data/carbon_flux/nee_data_fusion/FLX*')

datasets = []
skipped = []
used = []

for i, s3_path in enumerate(s3_paths):
    sys.stdout.write(f'\r{i+1}/{len(s3_paths)}')
    
    try:
        dd = cat.fluxnet_daily(s3_path=s3_path).to_dask()
    except FileNotFoundError:
        try:
            dd = cat.fluxnet_daily(s3_path=s3_path.split('/')[-1]).to_dask()
        except FileNotFoundError:
            continue
    site = dd['site'].cat.categories.item()
    
    if not set(dd.columns) >= set(data_columns):
        skipped.append(site)
        continue

    datasets.append(clean_data(dd))
    used.append(site)

print()
print(f'Found {len(used)} fluxnet sites with enough data to use - skipped {len(skipped)}')

This cell sometimes generates the following traceback:

Traceback
1/209
/Users/mac/Documents/development/examples/carbon_flux/envs/default/lib/python3.11/site-packages/dask_expr/_collection.py:4160: UserWarning: 
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
  Before: .apply(func)
  After:  .apply(func, meta=(None, 'object'))

  warnings.warn(meta_warning(meta))
/Users/mac/Documents/development/examples/carbon_flux/envs/default/lib/python3.11/site-packages/dask_expr/_collection.py:4160: UserWarning: 
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
  Before: .apply(func)
  After:  .apply(func, meta=('TIMESTAMP', 'object'))

  warnings.warn(meta_warning(meta))

This warning is repeated for all the cells up to 209/209.

The circumstances under which this error occurs are unclear. A temporary solution, discovered with the help of @hoxbro, involves removing the local version of intake and re-downloading it using anaconda-project run. This typically resolves the issue. However, restarting the kernel and running the notebook from the top down might bring back the Traceback error.

  1. Cell [20] Error: The following code in Cell [20] generates a traceback error when the full data is not downloaded properly (as in problem 2):
partial_soil_data = df[df[soil_data_columns].notnull().any(1)]
partial_soil_data_sites = metadata[metadata.site.isin(partial_soil_data.site.unique())]

Traceback:

TypeError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 partial_soil_data = df[df[soil_data_columns].notnull().any(1)]
      2 partial_soil_data_sites = metadata[metadata.site.isin(partial_soil_data.site.unique())]

TypeError: DataFrame.any() takes 1 positional argument but 2 were given

Using any(axis=1) resolves this error. However, if problem 2 does not occur, this cell runs without the TypeError.

@maximlt @droumis

Azaya89 avatar Aug 08 '24 14:08 Azaya89

  1. I have completely re-wrote the notebook to remove all usage of intake.

  2. The .csv files are downloaded locally via awscli by running anaconda-project run download_fluxnet_daily. This takes about a minute to download all the files and saves in the same folder as the .txt file.

  3. Some of the cells are failing the test now and I don't know why. I will investigate that later.

Otherwise, I think this is ready for review now.

@hoxbro

Azaya89 avatar Oct 18 '24 16:10 Azaya89

I have pushed a fix that will make the test pass. I'm unsure why it doesn't work when you scatter the index.

The doc build is failing; @Azaya89, can you try and see if you can fix this?

hoxbro avatar Nov 04 '24 11:11 hoxbro

Arf @Azaya89 I see we're still having some issues. The error we encounter looks very similar to the one reported here https://github.com/aws/aws-cli/issues/8988. Digging more into this direction should hopefully give us a solution. This for instance looks promising https://github.com/aws/aws-cli/issues/5623#issuecomment-801240811, this too https://stackoverflow.com/questions/64992288/s3-sync-issue-running-in-azure-devops-pipeline-on-linux.

maximlt avatar Nov 05 '24 18:11 maximlt

Arf @Azaya89 I see we're still having some issues. The error we encounter looks very similar to the one reported here aws/aws-cli#8988. Digging more into this direction should hopefully give us a solution. This for instance looks promising aws/aws-cli#5623 (comment), this too https://stackoverflow.com/questions/64992288/s3-sync-issue-running-in-azure-devops-pipeline-on-linux.

Thank you. Let me try this out...

Azaya89 avatar Nov 05 '24 18:11 Azaya89

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.

github-actions[bot] avatar Nov 05 '24 19:11 github-actions[bot]

The doc build is failing; @Azaya89, can you try and see if you can fix this?

Fixed. I think it is ready for final review now @hoxbro

Azaya89 avatar Nov 05 '24 19:11 Azaya89

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.

github-actions[bot] avatar Nov 07 '24 12:11 github-actions[bot]

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.

github-actions[bot] avatar Nov 13 '24 21:11 github-actions[bot]

Another run has replaced the dev docs site. I want to make sure you checked if everything looked good before it was replaced.

LGTM!

Azaya89 avatar Nov 13 '24 21:11 Azaya89

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.

github-actions[bot] avatar Nov 14 '24 14:11 github-actions[bot]