pudl icon indicating copy to clipboard operation
pudl copied to clipboard

Nightly Build Failure 2024-04-27

Open zaneselvans opened this issue 9 months ago • 3 comments

Overview

  • The etl_full.yml settings for the NREL ATB included the years 2019 and 2020, which aren't yet working (See #3576) and Pydantic's validation of the settings correctly failed. I went ahead and removed those years from the settings file on main so we can get another attempted build tonight.
  • However, this failure doesn't happen locally when I try to run the full ETL with the old settings, which is weird.
  • While investigating this I was confused by the NREL ATB extraction, which doesn't seem to make any use of the settings or datastore. So maybe this is only working because it's relying on defaults that aren't informed by the ETL settings at all?
  • The raw_nrelatb__data asset claims to require the datastore and dataset_settings resources, but doesn't actually make use of them.
  • The NREL ATB Extractor claims to require a Datastore as input, but doesn't receive one.
  • But looking at the other tabular extractors, they also don't seem to use any resources (even though they obviously must) so maybe there is a bunch of magic happening in the background? Can we document what is going on?
  • It looks like there's a bit of stale documentation in the extraction system, with a mix of references to Excel and CSV files in places where they are not appropriate.
class Extractor(ParquetExtractor):
    """Extractor for NREL ATB."""

    def __init__(self, *args, **kwargs):
        """Initialize the module.

        Args:
            ds (:class:datastore.Datastore): Initialized datastore.
        """
        self.METADATA = GenericMetadata("nrelatb")
        super().__init__(*args, **kwargs)


raw_nrelatb__all_dfs = raw_df_factory(Extractor, name="nrelatb")


@asset(
    required_resource_keys={"datastore", "dataset_settings"},
)
def raw_nrelatb__data(raw_nrelatb__all_dfs):
    """Extract raw NREL ATB data from annual parquet files to one dataframe.

    Returns:
        An extracted NREL ATB dataframe.
    """
    return Output(value=raw_nrelatb__all_dfs["data"])

zaneselvans avatar Apr 27 '24 15:04 zaneselvans

ty for catching the non-working partitions in the full settings! I'm also confused why the validations didn't fail for me locally. after changing the working partitions in sources i was able to re-run the full extraction and only get the working years. that's weird for sure.

A lot of the magic is happening via extract.extractor.raw_df_factory which runs extract.extractor.partition_extractor_factory which uses the datastore and the dataset_settings. I was mirroring the eia 176 extract which required those two as inputs into the asset but doesn't pass them around - but instead accesses them within raw_df_factory.

I agree in general that the extractor setup needs some documentation cleanup and maybe some higher level explanation somewhere.

cmgosnell avatar Apr 29 '24 12:04 cmgosnell

Tangible outcome here is:

  • replicate being able to run ATB with bogus settings, then figure out why the bogus settings aren't breaking the ATB run.

should have failed on import but that wasn't happening.

jdangerx avatar May 06 '24 19:05 jdangerx

@e-belfer might deal with this incidentally as part of integrating the new ATB.

jdangerx avatar Jul 01 '24 19:07 jdangerx

Unless I'm missing something, I can't reproduce this error locally. Trying to extract just the raw_nrelatb asset group locally with the following dagster config failed as expected. As discussed, there is a fair bit happening in the pudl.extract.extractor module. I'm all for adding more documentation to clarify what's going on there, but as far as I'm concerned it doesn't seem like anything is fundamentally broken, and I'm inclined to close this issue?

ops: {} resources: dataset_settings: config: eia: eia176: disabled: false years: - 1997 - 1998 - 1999 - 2000 - 2001 - 2002 - 2003 - 2004 - 2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 - 2018 - 2019 - 2020 - 2021 - 2022 eia191: disabled: false years: - 2014 - 2015 - 2016 - 2017 - 2018 - 2019 - 2020 - 2021 - 2022 - 2023 eia757a: disabled: false years: - 2012 - 2014 - 2017 eia860: all_eia860m_year_months: - 2015-07 - 2015-08 - 2015-09 - 2015-10 - 2015-11 - 2015-12 - 2016-01 - 2016-02 - 2016-03 - 2016-04 - 2016-05 - 2016-06 - 2016-07 - 2016-08 - 2016-09 - 2016-10 - 2016-11 - 2016-12 - 2017-01 - 2017-02 - 2017-03 - 2017-04 - 2017-05 - 2017-06 - 2017-07 - 2017-08 - 2017-09 - 2017-10 - 2017-11 - 2017-12 - 2018-01 - 2018-02 - 2018-03 - 2018-04 - 2018-05 - 2018-06 - 2018-07 - 2018-08 - 2018-09 - 2018-10 - 2018-11 - 2018-12 - 2019-01 - 2019-02 - 2019-03 - 2019-04 - 2019-05 - 2019-06 - 2019-07 - 2019-08 - 2019-09 - 2019-10 - 2019-11 - 2019-12 - 2020-01 - 2020-02 - 2020-03 - 2020-04 - 2020-05 - 2020-06 - 2020-07 - 2020-08 - 2020-09 - 2020-10 - 2020-11 - 2020-12 - 2021-01 - 2021-02 - 2021-03 - 2021-04 - 2021-05 - 2021-06 - 2021-07 - 2021-08 - 2021-09 - 2021-10 - 2021-11 - 2021-12 - 2022-01 - 2022-02 - 2022-03 - 2022-04 - 2022-05 - 2022-06 - 2022-07 - 2022-08 - 2022-09 - 2022-10 - 2022-11 - 2022-12 - 2023-01 - 2023-02 - 2023-03 - 2023-04 - 2023-05 - 2023-06 - 2023-07 - 2023-08 - 2023-09 - 2023-10 - 2023-11 - 2023-12 - 2024-01 - 2024-02 - 2024-03 disabled: false eia860m: true eia860m_year_months: - 2024-03 years: - 2001 - 2002 - 2003 - 2004 - 2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 - 2018 - 2019 - 2020 - 2021 - 2022 - 2023 eia860m: disabled: false year_months: - 2015-07 - 2015-08 - 2015-09 - 2015-10 - 2015-11 - 2015-12 - 2016-01 - 2016-02 - 2016-03 - 2016-04 - 2016-05 - 2016-06 - 2016-07 - 2016-08 - 2016-09 - 2016-10 - 2016-11 - 2016-12 - 2017-01 - 2017-02 - 2017-03 - 2017-04 - 2017-05 - 2017-06 - 2017-07 - 2017-08 - 2017-09 - 2017-10 - 2017-11 - 2017-12 - 2018-01 - 2018-02 - 2018-03 - 2018-04 - 2018-05 - 2018-06 - 2018-07 - 2018-08 - 2018-09 - 2018-10 - 2018-11 - 2018-12 - 2019-01 - 2019-02 - 2019-03 - 2019-04 - 2019-05 - 2019-06 - 2019-07 - 2019-08 - 2019-09 - 2019-10 - 2019-11 - 2019-12 - 2020-01 - 2020-02 - 2020-03 - 2020-04 - 2020-05 - 2020-06 - 2020-07 - 2020-08 - 2020-09 - 2020-10 - 2020-11 - 2020-12 - 2021-01 - 2021-02 - 2021-03 - 2021-04 - 2021-05 - 2021-06 - 2021-07 - 2021-08 - 2021-09 - 2021-10 - 2021-11 - 2021-12 - 2022-01 - 2022-02 - 2022-03 - 2022-04 - 2022-05 - 2022-06 - 2022-07 - 2022-08 - 2022-09 - 2022-10 - 2022-11 - 2022-12 - 2023-01 - 2023-02 - 2023-03 - 2023-04 - 2023-05 - 2023-06 - 2023-07 - 2023-08 - 2023-09 - 2023-10 - 2023-11 - 2023-12 - 2024-01 - 2024-02 - 2024-03 eia861: disabled: false years: - 2001 - 2002 - 2003 - 2004 - 2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 - 2018 - 2019 - 2020 - 2021 - 2022 eia923: disabled: false years: - 2001 - 2002 - 2003 - 2004 - 2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 - 2018 - 2019 - 2020 - 2021 - 2022 - 2023 - 2024 eia930: disabled: false half_years: - 2015half2 - 2016half1 - 2016half2 - 2017half1 - 2017half2 - 2018half1 - 2018half2 - 2019half1 - 2019half2 - 2020half1 - 2020half2 - 2021half1 - 2021half2 - 2022half1 - 2022half2 - 2023half1 - 2023half2 - 2024half1 eiaaeo: disabled: false years: - 2023 epacems: disabled: false year_quarters: - 1995q1 - 1995q2 - 1995q3 - 1995q4 - 1996q1 - 1996q2 - 1996q3 - 1996q4 - 1997q1 - 1997q2 - 1997q3 - 1997q4 - 1998q1 - 1998q2 - 1998q3 - 1998q4 - 1999q1 - 1999q2 - 1999q3 - 1999q4 - 2000q1 - 2000q2 - 2000q3 - 2000q4 - 2001q1 - 2001q2 - 2001q3 - 2001q4 - 2002q1 - 2002q2 - 2002q3 - 2002q4 - 2003q1 - 2003q2 - 2003q3 - 2003q4 - 2004q1 - 2004q2 - 2004q3 - 2004q4 - 2005q1 - 2005q2 - 2005q3 - 2005q4 - 2006q1 - 2006q2 - 2006q3 - 2006q4 - 2007q1 - 2007q2 - 2007q3 - 2007q4 - 2008q1 - 2008q2 - 2008q3 - 2008q4 - 2009q1 - 2009q2 - 2009q3 - 2009q4 - 2010q1 - 2010q2 - 2010q3 - 2010q4 - 2011q1 - 2011q2 - 2011q3 - 2011q4 - 2012q1 - 2012q2 - 2012q3 - 2012q4 - 2013q1 - 2013q2 - 2013q3 - 2013q4 - 2014q1 - 2014q2 - 2014q3 - 2014q4 - 2015q1 - 2015q2 - 2015q3 - 2015q4 - 2016q1 - 2016q2 - 2016q3 - 2016q4 - 2017q1 - 2017q2 - 2017q3 - 2017q4 - 2018q1 - 2018q2 - 2018q3 - 2018q4 - 2019q1 - 2019q2 - 2019q3 - 2019q4 - 2020q1 - 2020q2 - 2020q3 - 2020q4 - 2021q1 - 2021q2 - 2021q3 - 2021q4 - 2022q1 - 2022q2 - 2022q3 - 2022q4 - 2023q1 - 2023q2 - 2023q3 - 2023q4 - 2024q1 ferc1: disabled: false years: - 1994 - 1995 - 1996 - 1997 - 1998 - 1999 - 2000 - 2001 - 2002 - 2003 - 2004 - 2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 - 2018 - 2019 - 2020 - 2021 - 2022 ferc714: disabled: false years: - 2006 - 2007 - 2008 - 2009 - 2010 - 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 - 2018 - 2019 - 2020 glue: eia: true ferc1: true gridpathratoolkit: daily_weather: true disabled: false parts: [] processing_levels: - extended technology_types: - wind - solar nrelatb: disabled: false years: - 2019 - 2020 - 2021 - 2022 - 2023 - 2024 phmsagas: disabled: false years: - 1990 - 1991 - 1992 - 1993 - 1994 - 1995 - 1996 - 1997 - 1998 - 1999 - 2000 - 2001 - 2002 - 2003 - 2004 - 2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 - 2018 - 2019 - 2020 - 2021 - 2022 datastore: config: gcs_cache_path: '' use_local_cache: true pudl_io_manager: config: read_from_parquet: true write_to_parquet: true

e-belfer avatar Jul 15 '24 22:07 e-belfer

Hearing no objections, I'm closing this issue. We can re-open it if it becomes an issue again.

e-belfer avatar Jul 17 '24 18:07 e-belfer