Output files sometimes missing in us-east-2
sample application
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Sep 1 17:55:26 2022
@author: eafpres
"""
#
#%% libraries
#
import pandas as pd
import matplotlib.pyplot as plt
import sys
import os
#
#%% configure
#
my_os = sys.platform
print('found OS ', my_os)
print('user is: ', os.environ.get('USER'))
print('working directory is: ', os.getcwd())
#
#%% data
#
data = pd.read_csv('data/parabolic_data.csv')
#
#%% stats
#
print('data summary')
print(data.describe())
#
# save summary
#
data.describe().to_csv('output/data_summary.csv')
#
#%% visualize
#
fig, ax = plt.subplots(figsize = (9, 9))
ax.scatter(data['x'], data['y'])
plt.savefig('output/parabolic.jpg')
#
configure in us-east-1
(python38) PS C:\eaf llc\aa-Analytics and BI\dstack> dstack config --aws-profile default
Configure AWS backend:
Region name (us-east-2): us-east-1
S3 bucket name (eaf-test-dstack-20221012): eaf-test-dstack
The bucket 'eaf-test-dstack' doesn't exist. Create it? [y/n]: y
OK
add a tag for the data folder
(python38) PS C:\eaf llc\aa-Analytics and BI\dstack> dstack tags add test-dstack-data -a data
Uploading artifact 'data': 100%|████████████████████████████████████████████████████████████████████████| 640/640 [00:01<00:00, 364B/s]
OK
run the workflow
(python38) PS C:\eaf llc\aa-Analytics and BI\dstack> dstack run read_data
RUN WORKFLOW STATUS APPS ARTIFACTS SUBMITTED TAG
green-eel-1 read_data Submitted output 47 sec ago
Provisioning... It may take up to a minute. ✓
To interrupt, press Ctrl+C.
Collecting pandas==1.3.4
Downloading pandas-1.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)
Collecting matplotlib==3.5.0
Downloading matplotlib-3.5.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
Collecting pytz>=2017.3
Downloading pytz-2022.4-py2.py3-none-any.whl (500 kB)
Collecting python-dateutil>=2.7.3
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting numpy>=1.17.3
Downloading numpy-1.23.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
Collecting setuptools-scm>=4
Downloading setuptools_scm-7.0.5-py3-none-any.whl (42 kB)
Collecting packaging>=20.0
Downloading packaging-21.3-py3-none-any.whl (40 kB)
Collecting kiwisolver>=1.0.1
Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
Collecting pyparsing>=2.2.1
Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting pillow>=6.2.0
Downloading Pillow-9.2.0-cp38-cp38-manylinux_2_28_x86_64.whl (3.2 MB)
Collecting cycler>=0.10
Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting fonttools>=4.22.0
Downloading fonttools-4.37.4-py3-none-any.whl (960 kB)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas==1.3.4->-r requirements.txt (line 1)) (1.16.0)
Collecting typing-extensions
Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting tomli>=1.0.0
Downloading tomli-2.0.1-py3-none-any.whl (12 kB)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.8/site-packages (from setuptools-scm>=4->matplotlib==3.5.0->-r requirements.txt (line 2)) (61.2.0)
Installing collected packages: pyparsing, typing-extensions, tomli, packaging, setuptools-scm, pytz, python-dateutil, pillow, numpy, kiwisolver, fonttools, cycler, pandas, matplotlib
Successfully installed cycler-0.11.0 fonttools-4.37.4 kiwisolver-1.4.4 matplotlib-3.5.0 numpy-1.23.4 packaging-21.3 pandas-1.3.4 pillow-9.2.0 pyparsing-3.0.9 python-dateutil-2.8.2 pytz-2022.4 setuptools-scm-7.0.5 tomli-2.0.1 typing-extensions-4.4.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
found OS linux
user is: None
working directory is: /workflow
data summary
x y
count 76.000000 76.000000
mean 38.500000 1974.026316
std 22.083176 1754.755717
min 1.000000 11.000000
25% 19.750000 401.000000
50% 38.500000 1493.000000
75% 57.250000 3288.750000
max 76.000000 5787.000000
check the result
(python38) PS C:\eaf llc\aa-Analytics and BI\dstack> dstack artifacts list green-eel-1
ARTIFACT FILE SIZE
output data_summary.csv 170.0B
parabolic.jpg 29.3KiB
configure for us-east-2
(python38) PS C:\eaf llc\aa-Analytics and BI\dstack> dstack config --aws-profile eaf
Configure AWS backend:
Region name (us-east-1): us-east-2
S3 bucket name (eaf-test-dstack): eaf-test-dstack-2
The bucket 'eaf-test-dstack-2' doesn't exist. Create it? [y/n]: y
OK
add the tag
(python38) PS C:\eaf llc\aa-Analytics and BI\dstack> dstack tags add test-dstack-data -a data
Uploading artifact 'data': 100%|████████████████████████████████████████████████████████████████████████| 640/640 [00:01<00:00, 357B/s]
OK
run the workflow
(python38) PS C:\eaf llc\aa-Analytics and BI\dstack> dstack run read_data
RUN WORKFLOW STATUS APPS ARTIFACTS SUBMITTED TAG
bad-moose-1 read_data Submitted output 50 sec ago
Provisioning... It may take up to a minute. ✓
To interrupt, press Ctrl+C.
missing output
check for results
(python38) PS C:\eaf llc\aa-Analytics and BI\dstack> dstack artifacts list bad-moose-1
ARTIFACT FILE SIZE
The lack of output indicating the requirements.txt was handled is not every time--sometimes it shows up.
The lack of code output is not every time--sometimes it shows up
But every time in us-east-2 there is nothing in the output folder, and in us-east-1 there is
Posting it here from Slack for history:
What I did:
- Configured
us-east-2 - Run download from dstack-examples
- Checked output artifacts All worked well
If possible, to help me reproduce this issue, please create a small public Git repo so I can reproduce it exactly as you do it.
Also, in case it's possible, please attach the runner logs from CloudWatch for the corresponding runs.
CloudWatch Group: /dstack/runners<bucket-name>
I will organize a repo and update here when ready
I believe this is resolved in the current update, so I am closing.