Modernize carbon flux
Modernizing an example checklist
Preliminary checks
- [x] Look for open PRs and issues that reference the project you are updating. It is possible previous unmerged work in PR could be re-used to modernize the project. Comment on these PRs and issues when appropriate, hopefully we should be able to close some of them after your modernizing work.
Change ‘anaconda-project.yml’ to use the latest workable version of packages
- [x] Pin python=3.11
- [x] Remove the upper pin (e.g.
hvplot<0.9tohvplot,panel>=0.12,<1.0topanel>=0.12) of all other dependencies. Removing the upper pins of dependencies could necessitate code revisions in the notebooks to address any errors encountered in the updated environment. Should complexities or extensive time requirements arise, document issues for team discussion on whether to re-pin specific packages or explore other solutions. - [x] Add/update the lower pin of all other dependencies (e.g.
hvplottohvplot>=0.9.2,hvplot>=0.8tohvplot>=0.9.2). Usually, the new/updated lower pin of a dependency will be the version resolved afteranaconda preparehas been run. Execute!conda listin a notebook, oranaconda run conda listin the terminal, to display the version of each dependency installed in the environment. Adjusting the lower pin helps ensure that the locks produced for each platform (linux-64, win-64, osx-64, osx-arm64) rely on the tested dependencies and not on some older versions. - [x] If one of the channels include conda-forge or pyviz, ask Maxime if it can be removed
Plot API updates (discussed on a per-example basis)
- [x] Generally, try to replace HoloViews usage with hvPlot. At a certain point of complexity, such as with the use of ‘.select’, it might be better to stick with HoloViews. Additional examples of ‘complexity boundaries’ should be documented in this document.
- [x] Almost always, try to replace the use of
datashadewithrasterize(read this page). Essentially,rasterizeallows Bokeh to handle the colormapping instead of Datashader.
Interactivity API updates (discussed on a per-example basis)
- [x] Remove all
pn.interactusage - [x] Avoid
.param.watch()usage. This is pretty low-level and verbose approach and should not be used in Examples unless required, or an Example is specifically trying to demo its usage in an advanced workflow. - [x] Prefer using
pn.bind(). Read this page for explanation. - [x] For apps built using a class approach, when they create a
view()method and call it directly, update the class by inheriting frompn.viewable.Viewerand replaceview()by__panel__(). Here is an example.
Panel App updates (discussed on a per-example basis)
- [x] If the project doesn’t at any point create a Panel app at all, consider creating one. It can be as simple as wrapping a plot in
pn.Column, or more complicated to incorporate widgets, etc. Make the final app.servable(). - [x] If the project creates an app in a notebook but doesn’t deploy it (i.e. there is no
command: dashboarddeclaration in theanaconda-project.ymlfile), try adding it. - [x] If the project already deploys an app but doesn’t wrap it in a nice template, consider wrapping it in a template.
- [x] If the project deploys an app wrapped in a template, customize the template a little so all the apps don’t look similar (e.g. change the header background color). This doesn’t need to be discussed.
- [x] Comment start If you are building the application in a single cell, you can construct a template explicitly, like
template = pn.template.BootstrampTemplate, but if building up an app across multiple cells, it is probably cleaner to declare the template at the top withpn.extension(template='bootstrap'). See how to guide on setting a template.
General code quality updates
- [x] If the notebook disables warnings (e.g. with
warnings.simplefilter(‘ignore’)somewhere at the start of the notebook, remove this line. Try to update the code to remove the warnings, if any. If updating the code to remove the warnings is taking significant amount of time and effort, bring it up for discussion and we may decide to disable warnings again.
Text content
- [x] Edit the text content anywhere and everywhere that it can be improved for clarity.
- [x] Check the links are valid, and update old links (e.g. http -> https, xyz.pyviz.org -> xyz.holoviz.org)
- [x] Remove instructions to install packages inside an example
Visual appearance - Example
- [x] Check that the titles/headings make sense and are succinct.
- [x] Check that the text content blocks are easily readable; revise into additional paragraphs if needed.
- [x] Check that the code blocks are easily readable; revise as needed. (e.g. add spaces after commas in a list if there are none, wrap long lines, etc.)
- [x] Check image and plot sizes. If possible, making them responsive is highly recommended.
- [x] Check the appearance on a smartphone (check Google to see how to adapt the appearance of your browser to display pages as if they were seen from a smartphone, this is usually done via the web developer tools). This is not a top priority for all examples, but if there are a few easy and straightforward changes to make that can improve the experience, let’s do it.
- [x] Check the updated notebook with the original notebook
Visual appearance - Gallery
- [x] Check the thumbnail is visually appealing
- [x] Check the project title is well formatted (e.g.
Ml AnnotatorstoML Annotators), if not, add/update theexamples_config.titlefield inanaconda-project.yml - [x] Check the project description is appropriate, if not, update the
descriptionfield inanaconda-project.yml
Workflow (after you have made the changes above)
- [x] Run successfully
doit validate:<projectname> - [x] Run successfully
doit test:<projectname> - [x] Run successfully
doit doc_one –name <projectname>. It’s better if the project notebook(s) is saved with its outputs (but be sure to clear outputs before committing to the examples repo!) when building the docs. Then open this file in your browser./builtdocs/index.htmland check how the site looks. - [x] If you’re happy with all the above, open a PR. Reminder, clear notebook outputs before pushing to the PR.
This is still a WIP. Not ready for review yet.
Bug Report on this example notebook: Inconsistency with the usage of intake
These are the current issues preventing the complete modernization of this notebook:
- Version Compatibility:
Although it is recommended to pin
intaketo<2, only version0.6.2runs without errors. For example, executingmetadata = cat.fluxnet_metadata().read()results in the following traceback error with other versions:
Traceback
ValueError Traceback (most recent call last)
Cell In[4], line 1
----> 1 metadata = cat.fluxnet_metadata().read()
2 metadata.sample(5)
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:190, in CSVSource.read(self)
186 return self._dask_df.compute()
188 import pandas as pd
--> 190 self._get_schema()
191 return pd.concat([self._get_partition(i) for i in range(len(self.files()))])
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:142, in CSVSource._get_schema(self)
140 nrows = self._csv_kwargs.get("nrows")
141 self._csv_kwargs["nrows"] = 10
--> 142 df = self._get_partition(0)
143 if nrows is None:
144 del self._csv_kwargs["nrows"]
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:160, in CSVSource._get_partition(self, i)
157 return self._dask_df.get_partition(i).compute()
159 url_part = self.files()[i]
--> 160 return self._read_pandas(url_part, i)
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:166, in CSVSource._read_pandas(self, url_part, i)
163 import pandas as pd
165 if self.pattern is None:
--> 166 return pd.read_csv(url_part, storage_options=self._storage_options, **self._csv_kwargs)
168 drop_path_column = "include_path_column" not in self._csv_kwargs
169 path_column = self._path_column()
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
1013 kwds_defaults = _refine_defaults_read(
1014 dialect,
1015 delimiter,
(...)
1022 dtype_backend=dtype_backend,
1023 )
1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
617 _validate_names(kwds.get("names", None))
619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
622 if chunksize or iterator:
623 return parser
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds)
1617 self.options["has_index_names"] = kwds["has_index_names"]
1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine)
1878 if "b" not in mode:
1879 mode += "b"
-> 1880 self.handles = get_handle(
1881 f,
1882 mode,
1883 encoding=self.options.get("encoding", None),
1884 compression=self.options.get("compression", None),
1885 memory_map=self.options.get("memory_map", False),
1886 is_text=is_text,
1887 errors=self.options.get("encoding_errors", "strict"),
1888 storage_options=self.options.get("storage_options", None),
1889 )
1890 assert self.handles is not None
1891 f = self.handles.handle
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/common.py:728, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
725 codecs.lookup_error(errors)
727 # open URLs
--> 728 ioargs = _get_filepath_or_buffer(
729 path_or_buf,
730 encoding=encoding,
731 compression=compression,
732 mode=mode,
733 storage_options=storage_options,
734 )
736 handle = ioargs.filepath_or_buffer
737 handles: list[BaseBuffer]
File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/common.py:453, in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
445 return IOArgs(
446 filepath_or_buffer=file_obj,
447 encoding=encoding,
(...)
450 mode=fsspec_mode,
451 )
452 elif storage_options:
--> 453 raise ValueError(
454 "storage_options passed with file object or non-fsspec file path"
455 )
457 if isinstance(filepath_or_buffer, (str, bytes, mmap.mmap)):
458 return IOArgs(
459 filepath_or_buffer=_expand_user(filepath_or_buffer),
460 encoding=encoding,
(...)
463 mode=mode,
464 )
ValueError: storage_options passed with file object or non-fsspec file path
Pinning intake=0.6.2 resolves this issue without any traceback errors.
- Inconsistency in File Downloads:
The cell responsible for downloading the full
fluxnetfiles shows inconsistent behavior:
s3 = S3FileSystem(anon=True)
s3_paths = s3.glob('earth-data/carbon_flux/nee_data_fusion/FLX*')
datasets = []
skipped = []
used = []
for i, s3_path in enumerate(s3_paths):
sys.stdout.write(f'\r{i+1}/{len(s3_paths)}')
try:
dd = cat.fluxnet_daily(s3_path=s3_path).to_dask()
except FileNotFoundError:
try:
dd = cat.fluxnet_daily(s3_path=s3_path.split('/')[-1]).to_dask()
except FileNotFoundError:
continue
site = dd['site'].cat.categories.item()
if not set(dd.columns) >= set(data_columns):
skipped.append(site)
continue
datasets.append(clean_data(dd))
used.append(site)
print()
print(f'Found {len(used)} fluxnet sites with enough data to use - skipped {len(skipped)}')
This cell sometimes generates the following traceback:
Traceback
1/209
/Users/mac/Documents/development/examples/carbon_flux/envs/default/lib/python3.11/site-packages/dask_expr/_collection.py:4160: UserWarning:
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
Before: .apply(func)
After: .apply(func, meta=(None, 'object'))
warnings.warn(meta_warning(meta))
/Users/mac/Documents/development/examples/carbon_flux/envs/default/lib/python3.11/site-packages/dask_expr/_collection.py:4160: UserWarning:
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
Before: .apply(func)
After: .apply(func, meta=('TIMESTAMP', 'object'))
warnings.warn(meta_warning(meta))
This warning is repeated for all the cells up to 209/209.
The circumstances under which this error occurs are unclear. A temporary solution, discovered with the help of @hoxbro, involves removing the local version of intake and re-downloading it using anaconda-project run. This typically resolves the issue. However, restarting the kernel and running the notebook from the top down might bring back the Traceback error.
- Cell [20] Error: The following code in Cell [20] generates a traceback error when the full data is not downloaded properly (as in problem 2):
partial_soil_data = df[df[soil_data_columns].notnull().any(1)]
partial_soil_data_sites = metadata[metadata.site.isin(partial_soil_data.site.unique())]
Traceback:
TypeError Traceback (most recent call last)
Cell In[20], line 1
----> 1 partial_soil_data = df[df[soil_data_columns].notnull().any(1)]
2 partial_soil_data_sites = metadata[metadata.site.isin(partial_soil_data.site.unique())]
TypeError: DataFrame.any() takes 1 positional argument but 2 were given
Using any(axis=1) resolves this error. However, if problem 2 does not occur, this cell runs without the TypeError.
@maximlt @droumis
-
I have completely re-wrote the notebook to remove all usage of
intake. -
The
.csvfiles are downloaded locally viaawscliby runninganaconda-project run download_fluxnet_daily. This takes about a minute to download all the files and saves in the same folder as the.txtfile. -
Some of the cells are failing the test now and I don't know why. I will investigate that later.
Otherwise, I think this is ready for review now.
@hoxbro
I have pushed a fix that will make the test pass. I'm unsure why it doesn't work when you scatter the index.
The doc build is failing; @Azaya89, can you try and see if you can fix this?
Arf @Azaya89 I see we're still having some issues. The error we encounter looks very similar to the one reported here https://github.com/aws/aws-cli/issues/8988. Digging more into this direction should hopefully give us a solution. This for instance looks promising https://github.com/aws/aws-cli/issues/5623#issuecomment-801240811, this too https://stackoverflow.com/questions/64992288/s3-sync-issue-running-in-azure-devops-pipeline-on-linux.
Arf @Azaya89 I see we're still having some issues. The error we encounter looks very similar to the one reported here aws/aws-cli#8988. Digging more into this direction should hopefully give us a solution. This for instance looks promising aws/aws-cli#5623 (comment), this too https://stackoverflow.com/questions/64992288/s3-sync-issue-running-in-azure-devops-pipeline-on-linux.
Thank you. Let me try this out...
Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.
The doc build is failing; @Azaya89, can you try and see if you can fix this?
Fixed. I think it is ready for final review now @hoxbro
Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.
Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.
Another run has replaced the dev docs site. I want to make sure you checked if everything looked good before it was replaced.
LGTM!
Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.