MHKiT-Python icon indicating copy to clipboard operation
MHKiT-Python copied to clipboard

Add optional parameter frequency_dimension across wave.resource

Open akeeste opened this issue 1 year ago • 2 comments
trafficstars

This PR makes the xarray work in wave.resource more consistent by adding the optional parameter frequency_dimension to functions like significant_wave_height that then pass the parameter to frequency_moment.

This is an immediate speed fix for #331 by allowing xarray formats to be passed to all relevant wave.resource functions

akeeste avatar Jun 27 '24 15:06 akeeste

@akeeste in case you are catching up we need to get #335 merged before anything will pass tests due to the numpy 2.0.0 upgrade. (waiting on the wind hindcast rn which is everyone's favorite 5 hour test).

ssolson avatar Jun 27 '24 16:06 ssolson

@akeeste I wrote a script to see if this was improving the testing time and it does not appear to be making much difference. Is the purpose of this to address #331 or just to make the code more consistent?

Job Name run_10406645270 run_10373389775 Time Difference (minutes)
(ADCP_Delft3D_TRTS_example.ipynb, 1200) 18.850000 18.550000 -0.300000
(Delft3D_example.ipynb, 180) 2.216667 2.116667 -0.100000
(PacWave_resource_characterization_example.ipynb) 12.783333 12.233333 -0.550000
(SWAN_example.ipynb, 180) 2.150000 2.266667 0.116667
(WPTO_hindcast_example.ipynb, 180) 1.950000 1.866667 -0.083333
(adcp_example.ipynb, 240) 2.083333 3.083333 1.000000
(adv_example.ipynb, 180) 2.083333 1.933333 -0.150000
(cdip_example.ipynb, 180) 2.416667 3.000000 0.583333
(directional_waves.ipynb, 180) 1.983333 2.316667 0.333333
(environmental_contours_example.ipynb, 360) 10.616667 10.733333 0.116667
(extreme_response_MLER_example.ipynb, 360) 2.166667 2.350000 0.183333
(extreme_response_contour_example.ipynb, 360) 4.833333 4.500000 -0.333333
(extreme_response_full_sea_state_example.ipynb) 4.933333 5.550000 0.616667
(loads_example.ipynb, 180) 1.900000 2.000000 0.100000
(metocean_example.ipynb, 180) 2.000000 2.050000 0.050000
(mooring_example.ipynb, 240) 2.600000 2.683333 0.083333
(power_example.ipynb, 180) 2.116667 2.333333 0.216667
(qc_example.ipynb, 180) 1.883333 2.116667 0.233333
(river_example.ipynb, 180) 1.883333 1.850000 -0.033333
(short_term_extremes_example.ipynb, 180) 1.916667 1.916667 0.000000
(tidal_example.ipynb, 180) 2.000000 1.883333 -0.116667
(tidal_performance_example.ipynb, 180) 1.883333 2.150000 0.266667
(upcrossing_example.ipynb, 180) 1.900000 1.916667 0.016667
(wave_example.ipynb, 180) 1.900000 1.900000 0.000000
(wecsim_example.ipynb, 180) 2.166667 1.966667 -0.200000
import requests
import pandas as pd


# GitHub API token

token = <YOUR_GITHUB_PAT>

# Headers for authorization
headers = {"Authorization": f"token {token}"} if token else {}

urls = [
    "https://api.github.com/repos/MHKiT-Software/MHKiT-Python/actions/runs/10406645270/jobs?per_page=50",
    "https://api.github.com/repos/MHKiT-Software/MHKiT-Python/actions/runs/10373389775/jobs?per_page=50",
]

# Function to fetch jobs details from a workflow run
def fetch_jobs_data(url):
    response = requests.get(url, headers=headers, timeout=10)
    response.raise_for_status()  # Raise an error for bad status codes
    data = response.json()
    return data['jobs']

# Function to calculate the runtime of a job
def calculate_runtime(job):
    start_time = pd.to_datetime(job['started_at'])
    end_time = pd.to_datetime(job['completed_at'])
    runtime = end_time - start_time
    return runtime.total_seconds() / 60  # Convert to minutes

# Initialize an empty list to hold the job data
all_jobs_data = []

# Fetch and process job data for each workflow run
for url in urls:
    jobs = fetch_jobs_data(url)
    
    # Process job data and append it to the list
    for job in jobs:
        all_jobs_data.append({
            "workflow_run_id": job['run_id'],
            "job_name": job['name'],
            "status": job['status'],
            "conclusion": job['conclusion'],
            "runtime_minutes": calculate_runtime(job)
        })

# Convert the list of job data to a DataFrame
df = pd.DataFrame(all_jobs_data)

# Inspect the DataFrame
print(df.head())

# Example of additional filtering or processing:
# Filter only jobs that start with "test-notebooks"
df_filtered = df[df['job_name'].str.startswith("test-notebooks")]

# Optionally, clean up job names
df_filtered['job_name'] = df_filtered['job_name'].str.replace("test-notebooks (examples/", "(", regex=False)


# Pivot the data to compare runtimes side-by-side
pivot_df = df_filtered.pivot(index="job_name", columns="workflow_run_id", values="runtime_minutes").reset_index()
pivot_df.columns = ['job_name', 'run_10406645270', 'run_10373389775']

# Calculate the difference between the runtimes
pivot_df['time_difference (minutes)'] = pivot_df['run_10373389775'] - pivot_df['run_10406645270']

# Display the data to the user
print(pivot_df)

ssolson avatar Aug 16 '24 16:08 ssolson

@ssolson This PR doesn't yet address the underlying xarray slowdown, but it enables a workaround. The wave functions can currently be sped up by inputting xarray formats instead of pandas. However the parameter frequency_dimension was not added to all wave functions, which may limit xarray input to this functions and internal calls (typically calls to frequency_moment)

Most of the examples are still using pandas so this workaround is not utilized. Upcoming PRs #348 and after will address the underlying slowdown.

akeeste avatar Aug 19 '24 14:08 akeeste

From our discussion this test addresses the slowness by allowing a workaround but does not solve the slowness. Additional PR to follow for full issue.

ssolson avatar Aug 19 '24 16:08 ssolson