MHKiT-Python
MHKiT-Python copied to clipboard
Add optional parameter frequency_dimension across wave.resource
This PR makes the xarray work in wave.resource more consistent by adding the optional parameter frequency_dimension to functions like significant_wave_height that then pass the parameter to frequency_moment.
This is an immediate speed fix for #331 by allowing xarray formats to be passed to all relevant wave.resource functions
@akeeste in case you are catching up we need to get #335 merged before anything will pass tests due to the numpy 2.0.0 upgrade. (waiting on the wind hindcast rn which is everyone's favorite 5 hour test).
@akeeste I wrote a script to see if this was improving the testing time and it does not appear to be making much difference. Is the purpose of this to address #331 or just to make the code more consistent?
| Job Name | run_10406645270 | run_10373389775 | Time Difference (minutes) |
|---|---|---|---|
| (ADCP_Delft3D_TRTS_example.ipynb, 1200) | 18.850000 | 18.550000 | -0.300000 |
| (Delft3D_example.ipynb, 180) | 2.216667 | 2.116667 | -0.100000 |
| (PacWave_resource_characterization_example.ipynb) | 12.783333 | 12.233333 | -0.550000 |
| (SWAN_example.ipynb, 180) | 2.150000 | 2.266667 | 0.116667 |
| (WPTO_hindcast_example.ipynb, 180) | 1.950000 | 1.866667 | -0.083333 |
| (adcp_example.ipynb, 240) | 2.083333 | 3.083333 | 1.000000 |
| (adv_example.ipynb, 180) | 2.083333 | 1.933333 | -0.150000 |
| (cdip_example.ipynb, 180) | 2.416667 | 3.000000 | 0.583333 |
| (directional_waves.ipynb, 180) | 1.983333 | 2.316667 | 0.333333 |
| (environmental_contours_example.ipynb, 360) | 10.616667 | 10.733333 | 0.116667 |
| (extreme_response_MLER_example.ipynb, 360) | 2.166667 | 2.350000 | 0.183333 |
| (extreme_response_contour_example.ipynb, 360) | 4.833333 | 4.500000 | -0.333333 |
| (extreme_response_full_sea_state_example.ipynb) | 4.933333 | 5.550000 | 0.616667 |
| (loads_example.ipynb, 180) | 1.900000 | 2.000000 | 0.100000 |
| (metocean_example.ipynb, 180) | 2.000000 | 2.050000 | 0.050000 |
| (mooring_example.ipynb, 240) | 2.600000 | 2.683333 | 0.083333 |
| (power_example.ipynb, 180) | 2.116667 | 2.333333 | 0.216667 |
| (qc_example.ipynb, 180) | 1.883333 | 2.116667 | 0.233333 |
| (river_example.ipynb, 180) | 1.883333 | 1.850000 | -0.033333 |
| (short_term_extremes_example.ipynb, 180) | 1.916667 | 1.916667 | 0.000000 |
| (tidal_example.ipynb, 180) | 2.000000 | 1.883333 | -0.116667 |
| (tidal_performance_example.ipynb, 180) | 1.883333 | 2.150000 | 0.266667 |
| (upcrossing_example.ipynb, 180) | 1.900000 | 1.916667 | 0.016667 |
| (wave_example.ipynb, 180) | 1.900000 | 1.900000 | 0.000000 |
| (wecsim_example.ipynb, 180) | 2.166667 | 1.966667 | -0.200000 |
import requests
import pandas as pd
# GitHub API token
token = <YOUR_GITHUB_PAT>
# Headers for authorization
headers = {"Authorization": f"token {token}"} if token else {}
urls = [
"https://api.github.com/repos/MHKiT-Software/MHKiT-Python/actions/runs/10406645270/jobs?per_page=50",
"https://api.github.com/repos/MHKiT-Software/MHKiT-Python/actions/runs/10373389775/jobs?per_page=50",
]
# Function to fetch jobs details from a workflow run
def fetch_jobs_data(url):
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status() # Raise an error for bad status codes
data = response.json()
return data['jobs']
# Function to calculate the runtime of a job
def calculate_runtime(job):
start_time = pd.to_datetime(job['started_at'])
end_time = pd.to_datetime(job['completed_at'])
runtime = end_time - start_time
return runtime.total_seconds() / 60 # Convert to minutes
# Initialize an empty list to hold the job data
all_jobs_data = []
# Fetch and process job data for each workflow run
for url in urls:
jobs = fetch_jobs_data(url)
# Process job data and append it to the list
for job in jobs:
all_jobs_data.append({
"workflow_run_id": job['run_id'],
"job_name": job['name'],
"status": job['status'],
"conclusion": job['conclusion'],
"runtime_minutes": calculate_runtime(job)
})
# Convert the list of job data to a DataFrame
df = pd.DataFrame(all_jobs_data)
# Inspect the DataFrame
print(df.head())
# Example of additional filtering or processing:
# Filter only jobs that start with "test-notebooks"
df_filtered = df[df['job_name'].str.startswith("test-notebooks")]
# Optionally, clean up job names
df_filtered['job_name'] = df_filtered['job_name'].str.replace("test-notebooks (examples/", "(", regex=False)
# Pivot the data to compare runtimes side-by-side
pivot_df = df_filtered.pivot(index="job_name", columns="workflow_run_id", values="runtime_minutes").reset_index()
pivot_df.columns = ['job_name', 'run_10406645270', 'run_10373389775']
# Calculate the difference between the runtimes
pivot_df['time_difference (minutes)'] = pivot_df['run_10373389775'] - pivot_df['run_10406645270']
# Display the data to the user
print(pivot_df)
@ssolson This PR doesn't yet address the underlying xarray slowdown, but it enables a workaround. The wave functions can currently be sped up by inputting xarray formats instead of pandas. However the parameter frequency_dimension was not added to all wave functions, which may limit xarray input to this functions and internal calls (typically calls to frequency_moment)
Most of the examples are still using pandas so this workaround is not utilized. Upcoming PRs #348 and after will address the underlying slowdown.
From our discussion this test addresses the slowness by allowing a workaround but does not solve the slowness. Additional PR to follow for full issue.