pydsstools icon indicating copy to clipboard operation
pydsstools copied to clipboard

Parallel I/O call failure

Open dloney opened this issue 3 years ago • 12 comments

When I attempt to run multiple models in parallel making simultaneous DSS calls to pydsstools, my models fail due to I/O errors. It appears that either the calls do not block for the read/write or time out and fail to complete. When I do the same set of runs in separate environments, each with their own pydsstools package, I do not receive the issue. Since the DSS libraries are the same on my machine, the only difference is how the runs are using pydsstools, which isolates the issue to the package.

This happens regardless as to whether I'm using the python multiprocessing module or the same environment running multiple models simultaneously.

dloney avatar Jan 18 '22 18:01 dloney

@dloney Can you please post or attach simplified code that causes the I/O failure?

gyanz avatar Jan 18 '22 20:01 gyanz

This is the code that calls the multiprocessing operations:

o_pool = Pool(4)
l_output_data = o_pool.starmap(run_sequence, zip(repeat(s_hec5q_directory, len(ia_water_years)), repeat(s_template_directory, len(ia_water_years)), repeat(s_tdm_directory, len(ia_water_years)), ia_water_years, ia_types))
o_pool.close()

Within that, there are multiple DSS calls within each of the parallel function calls. This is one of the reads:

o_clear_creek_timeseries = o_dss_file.read_ts(s_dss_clear_creek_path, window=(s_startDate, s_endDate), trim_missing=True)
o_keswick_timeseries = o_dss_file.read_ts(s_dss_keswick_path, window=(s_startDate, s_endDate), trim_missing=True)
o_bend_bridge_timeseries = o_dss_file.read_ts(s_dss_bend_bridge_path, window=(s_startDate, s_endDate), trim_missing=True)

Also a write operation:

def temperature_target_time_series(i_water_year, s_dss_path, da_target_temperature):
    """
    Generates temperature target time series for given water year

    Parameters
    ----------
    i_water_year: int
        Gives the current water year
    s_dss_path: str
        path to DSS file
    i_target_temperature: int
        Target temperature for the water year
    i_number_of_series_entries: int
        Number of days in the input file

    Returns
    -------
    None. Changes written to the DSS file.

    """

    # Create the path name within the DSS file
    pathname = "/CALSIM_STOR/SHASTA_PT/TARGET-F//1DAY/2020D09E-1/"

    # Create the time series container to hold the replacement data inthe file
    tsc = TimeSeriesContainer()
    tsc.pathname = pathname
    tsc.startDateTime = "01DEC" + str(i_water_year - 1) + " 24:00:00"
    tsc.numberValues = da_target_temperature.shape[0]
    tsc.units = "DEGF"
    tsc.type = "PER-AVER"
    tsc.interval = 1
    tsc.values = da_target_temperature

    # Replace the data in the HEC5Q input file
    fid = HecDss.Open(s_dss_path)
    fid.deletePathname(tsc.pathname)
    fid.put_ts(tsc)
    ts = fid.read_ts(pathname)
    fid.close()

Reducing the multiprocessing call to a serial call in the code eliminates the DSS failures completely. That means the individual operations are fine and will execute correctly. There's also times the script will run without an issue for a low core count parallelization, say two or three cores. Other times, for the exact same setup, the script won't execute with any parallelization at all. The only consistent way I can get runs to parallelize is to use different pydsstools instances in fully independent environments.

dloney avatar Jan 18 '22 20:01 dloney

Thanks @dloney. Can you also post the python error message?

gyanz avatar Jan 18 '22 22:01 gyanz

I don't receive a Python error message. The DSS I/O fails silently. The only way to distinguish between a proper and improper run is to look at the model output and determine if the output files have been generated.

dloney avatar Jan 18 '22 22:01 dloney

@dloney This problem is not easy to debug as it involves parallel programming. It would help me if you can provide simplified code that captures gist of what your original code is doing.

gyanz avatar Jan 19 '22 13:01 gyanz

This is my full script, sanitized of location names:

import datetime, dateutil, pickle, time, calendar
import os, sys, shutil, subprocess
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from multiprocessing import Pool
from itertools import repeat

# from https://github.com/gyanz/pydsstools
from pydsstools.heclib.dss import HecDss
from pydsstools.core import TimeSeriesContainer


def run_sequence(s_hec5q_directory, s_template_directory, s_tdm_directory, i_water_year, i_year_type):
    """
    Configures, runs, and post processes a HEC-5Q run against a constant temperature target

    Parameters
    ----------
    s_hec5q_directory: str
        Path to the Hec5q toolkit directory
    s_template_directory: str
        Path to the model template directory
    s_tdm_directory: str
        Path to the TDM output folder
    i_water_year: int
        Water year currently being ran. This should be unique to avoid name collisions
    i_year_type: int
        Indicates the type of water year for the thresholds

    Returns
    -------
    da_times: ndarray
        Time steps from the output temperature series
    da_values: ndarray
        Values of the output temperature series
    i_temperature_target: int
        Target temperature for the water year
    d_degree_days: float
        Degree days above the temperature threshold

    """

    ### Create the working directory ###
    # Construct the folder name
    s_folder_working = os.getcwd()
    s_folder_working = os.path.join(s_folder_working, str(i_water_year))

    # Create the directory
    if not os.path.isdir(s_folder_working):
        os.makedirs(s_folder_working)

    ### Copy the HEC5Q toolkit to the working directory ###
    s_folder_hec5q_toolkit = os.path.join(os.getcwd(), s_hec5q_directory)

    ### Copy the HEC5Q template to the working directory ###
    # Split the filepath
    s_template_directory_base = os.path.basename(os.path.normpath(s_template_directory))

    # Copy the files
    s_folder_hec5q_template = os.path.join(os.getcwd(), s_template_directory)
    s_folder_working_hec5q_template = os.path.join(s_folder_working, s_template_directory_base)
    shutil.copytree(s_folder_hec5q_template, s_folder_working_hec5q_template)

    ### Update the HEC5Q command information information ###
    s_date_1 = str(i_water_year - 1) + '1201'
    s_date_2 = str(i_water_year) + '1031'

    # Replace the start date
    s_command = '"' + os.path.join(s_folder_hec5q_toolkit, 'bin', 'far') + '" "' + os.path.join(s_folder_working_hec5q_template, 'Model', 'SR_Run_File.run') + '"' + \
                " 19220101" + " " + s_date_1
    subprocess.run(s_command)

    # Replace the end date
    s_command = '"' + os.path.join(s_folder_hec5q_toolkit, 'bin', 'far') + '" "' + os.path.join(s_folder_working_hec5q_template, 'Model', 'SR_Run_File.run') + '"' + \
                " 20030930" + " " + s_date_2
    subprocess.run(s_command)

    ### Run the model ###
    # Get the current directory for later
    s_current_directory = os.getcwd()

    # Change to the model directory
    os.chdir(s_folder_working_hec5q_template)

    # Formulate the path and date information
    s_dss_station_1_path = "/SACRAMENTO/STATION 1/TEMP/01JAN1922/1DAY/R2019/"
    s_dss_station_2_path = "/SACRAMENTO/STATION 2/TEMP/01JAN1922/1DAY/R2019/"
    s_dss_station_3_path = "/SACRAMENTO/STATION 3/TEMP/01JAN1922/1DAY/R2019/"

    # Determine if the year is a leap year
    if calendar.isleap(i_water_year):
        i_number_of_days = 335 + 1
    else:
        i_number_of_days = 335

    # Get the starting temperature TCD targets
    da_station_1_temps_previous = np.Inf
    da_tcd_temps = set_temperature_target(i_year_type) * np.ones(i_number_of_days) - 3
    da_temp_adjustment_previous = np.zeros(da_tcd_temps.shape[0] - 2)
    i_temperature_target = set_temperature_target(i_year_type)


    ## Iterate until the tolerance is met ##
    d_tolerance = np.Inf
    i_counter = 0
    while d_tolerance > 0.001:
        # Add temperature target time series to CALSIMII_HEC5Q.dss
        s_dss_path = os.path.join(os.getcwd(),"Pre_Processor\CALSIMII_HEC5Q.dss")
        temperature_target_time_series(i_water_year, s_dss_path, da_tcd_temps)

        # Call the model
        subprocess.run("run_SR_temp_model.bat")

        # Change back to the original directory
        os.chdir(s_current_directory)

        ### Post process the output ###
        ## Plot the temperature target versus the actual temperature ##
        # Create the path to the result file
        s_folder_hec5q_results = os.path.join(s_folder_working_hec5q_template, 'Model', 'SR_WQ_Report.dss')

        # Open the file
        o_dss_file = HecDss.Open(s_folder_hec5q_results)

        s_startDate = "01DEC" + str(i_water_year - 1) + " 01:00:00"
        s_endDate = "31OCT" + str(i_water_year) + " 23:00:00"

        # Read files
        o_station_1_timeseries = o_dss_file.read_ts(s_dss_station_1_path, window=(s_startDate, s_endDate), trim_missing=True)
        o_station_2_timeseries = o_dss_file.read_ts(s_dss_station_2_path, window=(s_startDate, s_endDate), trim_missing=True)
        o_station_3_timeseries = o_dss_file.read_ts(s_dss_station_3_path, window=(s_startDate, s_endDate), trim_missing=True)

        # Split into the times and values
        da_station_1_times = np.array(o_station_1_timeseries.pytimes) - datetime.timedelta(days=1)
        da_station_1_values = np.copy(o_station_1_timeseries.values)

        da_station_2_times = np.array(o_station_2_timeseries.pytimes) - datetime.timedelta(days=1)
        da_station_2_values = np.copy(o_station_2_timeseries.values)

        da_station_3_times = np.array(o_station_3_timeseries.pytimes) - datetime.timedelta(days=1)
        da_station_3_values = np.copy(o_station_3_timeseries.values)

        # Close the dss file
        o_dss_file.close()

        # Calculate the tolerance
        d_tolerance = np.mean(np.abs(da_station_1_temps_previous - da_station_1_values) / np.abs(da_station_1_values))

        if d_tolerance > 0.001:
            # Get the previous three day moving average
            da_temp_adjustment = np.convolve(da_station_1_values - i_temperature_target, np.ones(3)/3, mode='valid')
            da_moving_average = da_temp_adjustment_previous - da_temp_adjustment
            da_temp_adjustment_previous = da_temp_adjustment

            # Update the TCD series
            da_tcd_temps[3:] += da_moving_average[:-1]

            # Update the previous year values
            da_station_1_temps_previous = da_station_1_values

            i_counter += 1

            os.chdir(s_folder_working_hec5q_template)

    ### Do everything after the model has converged ###
    # Plot the timeseries
    plot_gage(da_station_1_times, da_station_1_values, i_water_year, s_folder_working_hec5q_template, 'station_1_model', d_temperature_target=i_temperature_target)
    plot_gage(da_station_2_times, da_station_2_values, i_water_year, s_folder_working_hec5q_template, 'station_2_model')
    plot_gage(da_station_3_times, da_station_3_values, i_water_year, s_folder_working_hec5q_template, 'station_3_model')

    # Calculate the compliance with the target over just the modeled period
    da_threshold_difference = (da_station_1_values - i_temperature_target)
    da_threshold_difference = da_threshold_difference[da_threshold_difference > 0]
    d_degree_days_model = np.sum(da_threshold_difference)

    ### Create the TDM output file ###
    create_tdm_file(s_tdm_directory, i_water_year, da_station_1_times, da_station_1_values, da_station_2_values, da_station_3_values)

    # Return to the calling function
    return da_station_1_times, da_station_1_values, da_station_2_values, da_station_3_values, i_temperature_target, d_degree_days_model


def plot_gage(da_times, da_values, i_water_year, s_folder_working_hec5q_template, s_gage, d_temperature_target=None):
    """
    Plots the temperature time series for a specific gage location. If a target value is provided, it is also plotted

    Parameters
    ----------
    da_times: ndarray
        Dates from the gage DSS record
    da_values: ndarray
        Values from the gage DSS record
    i_water_year: int
        Current water year being plotted
    s_folder_working_hec5q_template: str
        Path to the working folder into which to save the file
    s_gage: str
        Name of the gage for the filename
    d_temperature_target: float or None
        Temperature target if included

    Returns
    -------
    None. File is written to the disk.

    """

    # Plot the series
    plt.plot(da_times, da_values, 'k-')

    # Plot the target and create the legend
    if d_temperature_target:
        plt.plot(da_times, np.ones(da_values.shape[0]) * d_temperature_target, 'r--')
        plt.legend(['Simulated', 'Target'], loc='best', framealpha=1, edgecolor='k')
    else:
        plt.legend(['Simulated'], loc='best', framealpha=1, edgecolor='k')

    # Adjust the format of the plot
    plt.xlabel('Time')
    plt.ylabel('Temperature (F)')
    plt.grid(which='both', linestyle="--", alpha=0.5, color='k')
    plt.xlim([da_times[0] - dateutil.relativedelta.relativedelta(days=1), da_times[-1]])
    plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))

    # Save the figure
    plt.tight_layout()
    plt.savefig(os.path.join(s_folder_working_hec5q_template, str(i_water_year) + "_" + s_gage + '.png'), dpi=600)
    plt.close()


def set_temperature_target(i_year_type):
    """
    Calculates the target temperature for the current water year type

    Parameters
    ----------
    i_year_type: int
        Gives the current water year type

    Returns
    -------
    i_target_temperature: int
        Target temperature for the water year

    """

    # Defines the target temperature threshold
    i_target_temperature = np.NaN

    # Determine the type of the water year
    if i_year_type == 5:
        i_target_temperature = 55
    elif i_year_type == 3 or i_year_type == 4:
        i_target_temperature = 54
    elif i_year_type == 1 or i_year_type == 2:
        i_target_temperature = 53
    else:
        raise NotImplementedError('Year type not understood to set the temperature target.')

    # Return to the calling function
    return i_target_temperature


def temperature_target_time_series(i_water_year, s_dss_path, da_target_temperature):
    """
    Generates temperature target time series for given water year

    Parameters
    ----------
    i_water_year: int
        Gives the current water year
    s_dss_path: str
        path to DSS file
    i_target_temperature: int
        Target temperature for the water year
    i_number_of_series_entries: int
        Number of days in the input file

    Returns
    -------
    None. Changes written to the DSS file.

    """

    # Create the path name within the DSS file
    pathname = "/CALSIM_STOR/SHASTA_PT/TARGET-F//1DAY/2020D09E-1/"

    # Create the time series container to hold the replacement data inthe file
    tsc = TimeSeriesContainer()
    tsc.pathname = pathname
    tsc.startDateTime = "01DEC" + str(i_water_year - 1) + " 24:00:00"
    tsc.numberValues = da_target_temperature.shape[0]
    tsc.units = "DEGF"
    tsc.type = "PER-AVER"
    tsc.interval = 1
    tsc.values = da_target_temperature

    # Replace the data in the HEC5Q input file
    fid = HecDss.Open(s_dss_path)
    fid.deletePathname(tsc.pathname)
    fid.put_ts(tsc)
    ts = fid.read_ts(pathname)
    fid.close()


def create_tdm_file(s_target_path, i_year, l_times, da_station_1_temperature, da_station_2_temperature, da_station_3_temperature):
    """
    Creates the TDM file to upload to the SacPAS server

    Parameters
    ----------
    s_target_path: str
        Path to save the file
    i_year: int
        Water year being analyzed
    l_times: list
        List of times the model has been ran for
    da_station_1_temperature: ndarray
        STATION 1 temperatures
    da_station_2_temperature: ndarray
        station_2 temperatures
    da_station_3_temperature: ndarray
        STATION 3 temperatures

    Returns
    -------
    None. File is written to the disk.

    """

    # Construct the filename
    s_filename = os.path.join(s_target_path, 'standard_' + str(i_year) + '.txt')

    # Structure the time series
    o_start_time = datetime.datetime(year=i_year, month=1, day=1)
    o_stop_time = datetime.datetime(year=i_year, month=10, day=31)

    ba_mask = np.array([True if o_start_time <= x <= o_stop_time else False for x in l_times]).astype(bool)
    da_station_1_temperature = da_station_1_temperature[ba_mask]
    da_station_2_temperature = da_station_2_temperature[ba_mask]
    da_station_3_temperature = da_station_3_temperature[ba_mask]

    ## Save to a file ##
    # Open the file
    o_file = open(s_filename, 'w+')

    # Write the header information
    o_file.write('Day,RKM485,RKM464,RKM418\n')

    # Write the January through October data
    for i_entry in range(0, da_station_1_temperature.shape[0], 1):
        # Create the line
        s_line = str(i_entry + 1) + ',' + str(np.around(da_station_2_temperature[i_entry], decimals=6)) + ',' + str(np.around(da_station_1_temperature[i_entry], decimals=6)) + ',' + \
                 str(np.around(da_station_3_temperature[i_entry], decimals=6)) + '\n'

        # Write the line
        o_file.write(s_line)

    # Write the footer information
    o_file.write(str(len(da_station_3_temperature)+1) + ":730,33,33,33\n")

    # Close the file
    o_file.close()


if __name__=="__main__":

    ### Setup the run information ###
    # Calsim information
    ia_water_years = np.arange(1922, 2003, 1)
    df_year_types = pd.read_csv('cases/inputs/historical/water_year_types.txt', header=0, index_col=0, delim_whitespace=True)

    # HEC-5Q information
    s_hec5q_directory = 'HEC5Q_Toolkit'
    s_template_directory = 'cases/scenario_1'
    s_tdm_directory = 'tdm_standard'

    ### Start the model runs ###
    # Create the TDM output directory
    if not os.path.isdir(s_tdm_directory):
        os.makedirs(s_tdm_directory)

    # Run serial ##
    # l_output_data = []
    # for i_entry_model in range(0, ia_water_years.shape[0], 1):
    #     l_output_data.append(run_sequence(s_hec5q_directory, s_template_directory, s_tdm_directory, ia_water_years[i_entry_model], df_year_types.loc[ia_water_years[i_entry_model]].values[0]))

    # Run parallel ##
    # Reindex the year types to be the correct order
    ia_types = np.array([df_year_types.loc[x].values[0] for x in ia_water_years])

    # Call the solver
    o_pool = Pool(4)
    l_output_data = o_pool.starmap(run_sequence, zip(repeat(s_hec5q_directory, len(ia_water_years)), repeat(s_template_directory, len(ia_water_years)),
                                                     repeat(s_tdm_directory, len(ia_water_years)), ia_water_years, ia_types))
    o_pool.close()
    
    # Convert the iterator to a list
    l_output_data = list(l_output_data)

    ## Flatten the data from the list into arrays ##
    dm_times = [x[0] for x in l_output_data]
    dm_station_1_temperature = [x[1] for x in l_output_data]
    dm_station_2_temperature = [x[2] for x in l_output_data]
    dm_station_3_temperature = [x[3] for x in l_output_data]
    ia_temperature_target = np.array([x[4] for x in l_output_data])
    da_degree_days_model = np.array([x[5] for x in l_output_data])

    o_file = open('pickled_standard.p', 'wb+')
    pickle.dump((dm_times, dm_station_1_temperature, dm_station_2_temperature, dm_station_3_temperature, ia_temperature_target, da_degree_days_model),
                o_file)
    o_file.close()

    # o_file = open('pickled_standard.p', 'rb')
    # dm_times, dm_station_1_temperature, dm_station_2_temperature, dm_station_3_temperature, ia_temperature_target, da_degree_days_model = \
    #       pickle.load(o_file)
    # o_file.close()

    ### Postprocessing ###
    # Open a pandas excel writer
    o_excel_writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')

    ## Process the degree day information ##
    # Write the degree days to excel
    pd.DataFrame(index=ia_water_years, data=da_degree_days_model).to_excel(o_excel_writer, sheet_name='degree_days_model')

    # Plot the modeled series
    plt.plot(ia_water_years, da_degree_days_model, 'ko')
    plt.xlabel('Water Years')
    plt.ylabel('Degree Days above Target (FDays)')
    plt.ylim([0, None])
    plt.xlim([1920, 2010])
    plt.grid(which='both', linestyle="--", alpha=0.5, color='k')

    plt.tight_layout()
    plt.savefig('degree_days_model.png', dpi=600)
    plt.close()

    # Plot the model values by water year type
    ia_unique = np.unique(df_year_types.values)
    for i_entry in range(0, ia_unique.shape[0], 1):
        # Calculate the set members
        ba_members = df_year_types == ia_unique[i_entry]

        # Get the years
        ia_member_years = df_year_types.index[ba_members.values.flatten()]

        # Reverse the mask
        ba_year_mask = np.in1d(ia_water_years, ia_member_years)
        ia_year_indices = np.argwhere(ba_year_mask).flatten()

        # Plot the complete series
        if len(ia_year_indices) > 0:

            # Plot the modeled data
            for i_entry_year in range(0, ia_year_indices.shape[0], 1):
                plt.plot(ia_water_years[ia_year_indices[i_entry_year]], da_degree_days_model[ia_year_indices[i_entry_year]], 'ko')

            plt.xlabel('Water Years')
            plt.ylabel('Degree Days above Target (FDays)')
            plt.ylim([0, None])
            plt.xlim([1920, 2010])
            plt.grid(which='both', linestyle="--", alpha=0.5, color='k')

            plt.tight_layout()
            plt.savefig('degree_days_' + str(ia_unique[i_entry]) + '_model.png', dpi=600)
            plt.close()

    ## Process the temperature timeseries information ##
    # Write the timeseries information to excel
    pd.DataFrame(index=ia_water_years, data=dm_station_1_temperature).to_excel(o_excel_writer, sheet_name='temperature_station_1')
    pd.DataFrame(index=ia_water_years, data=dm_station_2_temperature).to_excel(o_excel_writer, sheet_name='temperature_station_2')
    pd.DataFrame(index=ia_water_years, data=dm_station_3_temperature).to_excel(o_excel_writer, sheet_name='temperature_station_3')

    # Plot the values by water year
    ia_unique = np.unique(df_year_types.values)
    for i_entry in range(0, ia_unique.shape[0], 1):
        # Calculate the set members
        ba_members = df_year_types == ia_unique[i_entry]

        # Get the years
        ia_member_years = df_year_types.index[ba_members.values.flatten()]

        # Reverse the mask
        ba_year_mask = np.in1d(ia_water_years, ia_member_years)
        ia_year_indices = np.argwhere(ba_year_mask).flatten()

        # Plot the complete series
        if len(ia_year_indices) > 0:
            # Plot for STATION 1
            for i_entry_year in range(0, ia_year_indices.shape[0], 1):
                df_data = pd.DataFrame(index=dm_times[ia_year_indices[i_entry_year]], data=dm_station_1_temperature[ia_year_indices[i_entry_year]])
                df_data[df_data > 1000] = np.NaN
                df_data[df_data < -1000] = np.NaN

                i_year_offset = 2000 - np.min(df_data.index.year)
                adjusted_date = df_data.index.to_pydatetime() + dateutil.relativedelta.relativedelta(years=i_year_offset)

                plt.plot_date(adjusted_date, df_data.values, '-')

            plt.plot_date(plt.xlim(), np.ones(2) * set_temperature_target(ia_unique[i_entry]), 'k--')

            plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
            plt.xlabel('Time')
            plt.ylabel('Temperature (F)')
            plt.grid(which='both', linestyle="--", alpha=0.5, color='k')
            plt.xlim([adjusted_date[0], adjusted_date[-1]])

            plt.tight_layout()
            plt.savefig('temperature_station1_' + str(ia_unique[i_entry]) + '.png', dpi=600)
            plt.close()

            # Plot for station_2
            for i_entry_year in range(0, ia_year_indices.shape[0], 1):
                df_data = pd.DataFrame(index=dm_times[ia_year_indices[i_entry_year]], data=dm_station_2_temperature[ia_year_indices[i_entry_year]])
                df_data[df_data > 1000] = np.NaN
                df_data[df_data < -1000] = np.NaN

                i_year_offset = 2000 - np.min(df_data.index.year)
                adjusted_date = df_data.index.to_pydatetime() + dateutil.relativedelta.relativedelta(years=i_year_offset)

                plt.plot_date(adjusted_date, df_data.values, '-')

            plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
            plt.xlabel('Time')
            plt.ylabel('Temperature (F)')
            plt.grid(which='both', linestyle="--", alpha=0.5, color='k')
            plt.xlim([adjusted_date[0], adjusted_date[-1]])

            plt.tight_layout()
            plt.savefig('temperature_station2_' + str(ia_unique[i_entry]) + '.png', dpi=600)
            plt.close()

            # Plot for station 3
            for i_entry_year in range(0, ia_year_indices.shape[0], 1):
                df_data = pd.DataFrame(index=dm_times[ia_year_indices[i_entry_year]], data=dm_station_3_temperature[ia_year_indices[i_entry_year]])
                df_data[df_data > 1000] = np.NaN
                df_data[df_data < -1000] = np.NaN

                i_year_offset = 2000 - np.min(df_data.index.year)
                adjusted_date = df_data.index.to_pydatetime() + dateutil.relativedelta.relativedelta(years=i_year_offset)

                plt.plot_date(adjusted_date, df_data.values, '-')

            plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
            plt.xlabel('Time')
            plt.ylabel('Temperature (F)')
            plt.grid(which='both', linestyle="--", alpha=0.5, color='k')
            plt.xlim([adjusted_date[0], adjusted_date[-1]])

            plt.tight_layout()
            plt.savefig('temperature_station3_' + str(ia_unique[i_entry]) + '.png', dpi=600)
            plt.close()

    # Close the writer and save
    o_excel_writer.close()

dloney avatar Jan 19 '22 14:01 dloney

@gyanz Any luck tracking down the issue? If we move offline, I can share the full model stack with you and help debug.

dloney avatar Feb 01 '22 04:02 dloney

I think this library is not thread safe since the extension module that wraps the C/C++ libraries uses global variables and python objects. The C/C++ functions and classes themselves may not be safe in parallel environment without using synchronization locks. Solving this issue would require reorganization of the extension module, and huge effort and research. Currently, this is not a top priority for me as I am focusing on improving essential aspects of the library.

gyanz avatar Feb 01 '22 14:02 gyanz

I'm working on adding the thread locks, but I'm getting build error when compiling on linux when linking to heclib.a What compiler version was used to build that library?

dloney avatar Feb 03 '22 03:02 dloney

I use GCC 9.3. You can find the dependencies in README.

gyanz avatar Feb 03 '22 14:02 gyanz

@dloney I have attached batch file I use to build in Windows 10. The batch file loads Intel OneApi, Visual Studio and conda environments before executing the build command. build_wheels.zip

In Linux, GCC, gfortran and few other dependencies are needed. Unlike Windows 10, I recommend using Python's virtual env instead of conda environment during the build. The virtual environment is needed for libpython, Python.h, and cython and numpy headers.

gyanz avatar Feb 04 '22 15:02 gyanz

@dloney DSS-7 C library is using static hec_zdssLastError struct to report error status of each API call. This means that DSS-7 operations can't be truly parallelized. However, I see some benefits in being able to use API calls from multiple threads by using Mutex. For example, each thread can read from the dss file in series, and spend significant time processing the data in CPU.

gyanz avatar Feb 10 '22 15:02 gyanz