metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Metadata request (/flows/<flowname>) failed (code 502)

Open michellewehr opened this issue 1 year ago • 3 comments

I'm getting the following error:

 Metaflow 2.2.7 executing AnalysisFlow for user:ssm-user
 Validating your flow...
     The graph looks good!
 Creating local datastore in current directory 
 Bootstrapping conda environment...(this could take a few minutes)
     Metaflow service error:
     Metadata request (/flows/AnalysisFlow) failed (code 502): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> 
     <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
         <head>
             <title>The page is temporarily unavailable</title>
             ....

Would this be indicative that something is up with the METAFLOW_SERVICE_URL or what is returning the 502 error? I am getting the same error when I hit my APIs that use metaflow API client to grab artifacts

michellewehr avatar Aug 26 '24 13:08 michellewehr

Yep - this is the metadata service returning a 502. How is the service deployed right now?

savingoyal avatar Aug 28 '24 14:08 savingoyal

Within an ec2 instance. We have pinpointed the issue being the docker image is failing to run because it is unable to connect to db. We are seeing an exception raised:

"Exception: unable to get db version via goose: ....

We recently upgraded the database from postgres 11 to postgres 15— is there any new param we need to pass to the docker run start command (currently passing db username, password, port, host, db name) or something else that would have changed due to the update? We are on metaflow version 2.2.7

Thanks in advance!

michellewehr avatar Aug 28 '24 17:08 michellewehr

it should work as is - is there any thing more in the stack trace?

savingoyal avatar Aug 28 '24 17:08 savingoyal

I'm seeing an error here:

version = await ApiUtils.get_goose_version()\...
File \"/root/services/migration_service/api/utils.py\", line 54, in get_latest_compatible_version\r\n",

and more

{"log":"Exception: unable to get db version via goose: goose run: failed to connect to `host=<host> user=<username> database=<dbname>`: server error (FATAL: no pg_hba.conf entry for host , user, database , no encryption (SQLSTATE 28000))
...}

We also recently updated our cert, but I'm not sure where that would be configured or passed here.... I don't see anything in our startapp or set up files having to do with certs...

michellewehr avatar Aug 28 '24 17:08 michellewehr

yes - this seems like a connectivity issue between your service and db. are you able to verify that you are able to connect to the database?

savingoyal avatar Aug 28 '24 18:08 savingoyal

I am able to connect to my database both locally and within the app via django server and APIs, but my APIs break for the ones relying on metaflow service which would be managed (if I'm understanding correctly) by that docker image under the hood

How would run docker setup/ connection present a cert? Is there an additional param?

michellewehr avatar Aug 28 '24 18:08 michellewehr

Do you think it could be a versioning issue? With goose maybe? I had to update pg8000 package (ended up using psycopg2-binary==2.9.9) on django server side since updating postgres version and saw a very similar error prior to switching packages...

Is it metaflow that uses goose? If I update metaflow do you think it would cause more versioning issues with current project or help resolve this goose error?

michellewehr avatar Aug 29 '24 12:08 michellewehr

We found out the source! postgres 15 has the default parameter group force_ssl set to true whereas our postgres 11 default parameter did not. We updated our metaflow to version 2.4.12 so that we can pass cert to docker start up/ db connection.

Looking at the metadata service environmental variables

    ssl_mode = os.environ.get("MF_METADATA_DB_SSL_MODE")
    ssl_cert_path = os.environ.get("MF_METADATA_DB_SSL_CERT_PATH")
    ssl_key_path = os.environ.get("MF_METADATA_DB_SSL_KEY_PATH")
    ssl_root_cert_path = os.environ.get("MF_METADATA_DB_SSL_ROOT_CERT")

Do we include ssl_root_cert_path in our run docker command?

And if so-- where is this running from? What would the path look like-- we run from a repo that exists inside our ec2 instance, yet when I passed a ssl_root_cert_path with a value /home/ec2-user/repo and path where file exists/ it can't find the path...

michellewehr avatar Aug 30 '24 18:08 michellewehr