metaflow
metaflow copied to clipboard
Metadata request (/flows/<flowname>) failed (code 502)
I'm getting the following error:
Metaflow 2.2.7 executing AnalysisFlow for user:ssm-user
Validating your flow...
The graph looks good!
Creating local datastore in current directory
Bootstrapping conda environment...(this could take a few minutes)
Metaflow service error:
Metadata request (/flows/AnalysisFlow) failed (code 502): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>The page is temporarily unavailable</title>
....
Would this be indicative that something is up with the METAFLOW_SERVICE_URL or what is returning the 502 error? I am getting the same error when I hit my APIs that use metaflow API client to grab artifacts
Yep - this is the metadata service returning a 502. How is the service deployed right now?
Within an ec2 instance. We have pinpointed the issue being the docker image is failing to run because it is unable to connect to db. We are seeing an exception raised:
"Exception: unable to get db version via goose: ....
We recently upgraded the database from postgres 11 to postgres 15— is there any new param we need to pass to the docker run start command (currently passing db username, password, port, host, db name) or something else that would have changed due to the update? We are on metaflow version 2.2.7
Thanks in advance!
it should work as is - is there any thing more in the stack trace?
I'm seeing an error here:
version = await ApiUtils.get_goose_version()\...
File \"/root/services/migration_service/api/utils.py\", line 54, in get_latest_compatible_version\r\n",
and more
{"log":"Exception: unable to get db version via goose: goose run: failed to connect to `host=<host> user=<username> database=<dbname>`: server error (FATAL: no pg_hba.conf entry for host , user, database , no encryption (SQLSTATE 28000))
...}
We also recently updated our cert, but I'm not sure where that would be configured or passed here.... I don't see anything in our startapp or set up files having to do with certs...
yes - this seems like a connectivity issue between your service and db. are you able to verify that you are able to connect to the database?
I am able to connect to my database both locally and within the app via django server and APIs, but my APIs break for the ones relying on metaflow service which would be managed (if I'm understanding correctly) by that docker image under the hood
How would run docker setup/ connection present a cert? Is there an additional param?
Do you think it could be a versioning issue? With goose maybe? I had to update pg8000 package (ended up using psycopg2-binary==2.9.9) on django server side since updating postgres version and saw a very similar error prior to switching packages...
Is it metaflow that uses goose? If I update metaflow do you think it would cause more versioning issues with current project or help resolve this goose error?
We found out the source! postgres 15 has the default parameter group force_ssl set to true whereas our postgres 11 default parameter did not. We updated our metaflow to version 2.4.12 so that we can pass cert to docker start up/ db connection.
Looking at the metadata service environmental variables
ssl_mode = os.environ.get("MF_METADATA_DB_SSL_MODE")
ssl_cert_path = os.environ.get("MF_METADATA_DB_SSL_CERT_PATH")
ssl_key_path = os.environ.get("MF_METADATA_DB_SSL_KEY_PATH")
ssl_root_cert_path = os.environ.get("MF_METADATA_DB_SSL_ROOT_CERT")
Do we include ssl_root_cert_path in our run docker command?
And if so-- where is this running from? What would the path look like-- we run from a repo that exists inside our ec2 instance, yet when I passed a ssl_root_cert_path with a value /home/ec2-user/repo and path where file exists/ it can't find the path...