yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] mflow R package only returns `Error in wait_for`[BUG]

Open acircleda opened this issue 10 months ago • 9 comments

Issues Policy acknowledgement

  • [X] I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

  • Client: version 2.11.3

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 11 Pro
  • Python version: 3.11.3

Describe the problem

Nearly all R commands return the following error:

Error in wait_for(function() mlflow_rest("experiments", "search", client = client,  : 
  Operation failed after waiting for 10 seconds

This occurs on a fresh install of mlflow using pip install mlflow and a fresh install of mlflow for R `install.packages("mlflow").

R version is 2.11.1

Tracking information

MLflow version: 2.11.3
Tracking URI: file:///C:/Users/XXX/Git/model-development/mlruns
Artifact URI: mlflow-artifacts:/0/63092125fe6b4133af90ef19d22f2a0f/artifacts
System information: Windows 10.0.22631
Python version: 3.11.3
MLflow version: 2.11.3
MLflow module location: C:\PYTHON~1\Lib\site-packages\mlflow\__init__.py
Tracking URI: file:///C:/Users/XXX/Git/model-development/mlruns
Registry URI: file:///C:/Users/XXX/Git/model-development/mlruns
Active experiment ID: 0
Active run ID: 63092125fe6b4133af90ef19d22f2a0f
Active run artifact URI: mlflow-artifacts:/0/63092125fe6b4133af90ef19d22f2a0f/artifacts
MLflow dependencies: 
  Flask: 3.0.0
  Jinja2: 3.1.2
  aiohttp: 3.8.6
  alembic: 1.13.1
  boto3: 1.26.148
  botocore: 1.29.148
  click: 8.1.3
  cloudpickle: 3.0.0
  docker: 7.0.0
  entrypoints: 0.4
  fastapi: 0.104.1
  gitpython: 3.1.40
  graphene: 3.3
  importlib-metadata: 6.8.0
  markdown: 3.6
  matplotlib: 3.8.2
  numpy: 1.24.3
  packaging: 23.1
  pandas: 2.0.1
  protobuf: 4.25.0
  pyarrow: 14.0.0
  pydantic: 1.10.13
  pytz: 2023.3
  pyyaml: 5.4.1
  querystring-parser: 1.2.4
  requests: 2.31.0
  scikit-learn: 1.4.2
  scipy: 1.10.1
  sqlalchemy: 2.0.23
  sqlparse: 0.4.4
  tiktoken: 0.4.0
  uvicorn: 0.24.0.post1
  virtualenv: 20.23.0
  waitress: 3.0.0
  watchfiles: 0.21.0

Code to reproduce issue

library(mlflow)
server <- mlflow::mlflow_server()
mlflow::mlflow_ui()

Stack trace

6: stop("Operation failed after waiting for ", wait, " seconds")
5: wait_for(function() mlflow_rest("experiments", "search", client = client, 
       verb = "POST", data = list(max_results = 1)), getOption("mlflow.connect.wait", 
       10), getOption("mlflow.connect.sleep", 1))
4: mlflow_validate_server(client)
3: mlflow_client()
2: mlflow_ui.NULL()
1: mlflow::mlflow_ui()

Other info / logs

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] mlflow_2.11.1

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       swagger_3.33.1    httr_1.4.7        cli_3.6.2         rlang_1.1.3      
 [6] zeallot_0.1.0     processx_3.8.3    png_0.1-8         purrr_1.0.2       renv_1.0.5       
[11] promises_1.2.1    jsonlite_1.8.8    glue_1.7.0        openssl_2.1.1     forge_0.2.0      
[16] askpass_1.2.0     httpuv_1.6.14     ps_1.7.6          fansi_1.0.6       grid_4.3.2       
[21] ini_0.3.1         base64enc_0.1-3   yaml_2.3.8        lifecycle_1.0.4   compiler_4.3.2   
[26] fs_1.6.3          Rcpp_1.0.12       rstudioapi_0.15.0 later_1.3.2       lattice_0.21-9   
[31] R6_2.5.1          utf8_1.2.4        reticulate_1.35.0 pillar_1.9.0      curl_5.2.1       
[36] magrittr_2.0.3    Matrix_1.6-5      tools_4.3.2       withr_3.0.0   

What component(s) does this bug affect?

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [ ] area/server-infra: MLflow Tracking server backend
  • [ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

What language(s) does this bug affect?

  • [X] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

acircleda avatar Apr 10 '24 18:04 acircleda

@acircleda I can't repro your problem on MLflow 2.11.3, could you double check you installed it correctly? Are the environment variables (MLFLOW_BIN, MLFLOW_PYTHON_BIN) set correctly? I can correctly open mlflow ui with below command: image

serena-ruan avatar Apr 11 '24 11:04 serena-ruan

I suspect (from some previous issues) this may be a Windows-specific issue.

I believe the environment variables are set correctly:

> Sys.which('python')
                    python 
"C:\\PYTHON~1\\python.exe" 
> Sys.which('mlflow')
                             mlflow 
"C:\\PYTHON~1\\Scripts\\mlflow.exe" 

In checking these paths I can confirm the exe's exist.

Some additional info:

If I run, server <- mlflow::mlflow_server(), the following is returned:

$server_url
[1] "http://127.0.0.1:5000"

$handle
PROCESS 'mlflow.exe', running, pid 22388.

$file_store
[1] "file://C:/Users/XXX/Documents/mlruns"

attr(,"class")
[1] "mlflow_server"

However, no other commands work and http://127.0.0.1:5000 is not accessible. If I switch to the terminal and run mlflow server --host 127.0.0.1 --port 8080, I get mlflow server --host 127.0.0.1 --port 8080, after which the UI is accessible.

acircleda avatar Apr 11 '24 13:04 acircleda

Just as a follow-up, I ran the following code: Sys.setenv(MLFLOW_BIN=Sys.which("mlflow")) Sys.setenv(MLFLOW_PYTHON_BIN=Sys.which("python"))

and verified using Sys.getenv() that these existed in the system environment, which they did. Running the following commands still resulted in the same error:

library(mlflow)
mlflow_client(tracking_uri = NULL)

acircleda avatar Apr 11 '24 13:04 acircleda

@acircleda Could you try if the python code works? Want to see if your tracking server breaks or it's R installation problem.

import mlflow

with mlflow.start_run():
    mlflow.log_param("test", "test")

serena-ruan avatar Apr 12 '24 04:04 serena-ruan

This seems to work. I did a fresh install into a venv-controlled environment on python 3.11.3

import mlflow with mlflow.start_run(): ... mlflow.log_param("test", "test") ... 'test'

I also ran this same python code in R Studio (via reticulate) and got the same result.

Tried the same command in R and got the same error reported in this ticket.

library(mlflow)
mlflow::mlflow_log_param("test", "test")

acircleda avatar Apr 12 '24 11:04 acircleda

What if you run R in terminal within the same python venv and try those R commands?

serena-ruan avatar Apr 15 '24 12:04 serena-ruan

Using the R -e '...' commands in the terminal, I get the same error as above.

acircleda avatar Apr 15 '24 18:04 acircleda

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

github-actions[bot] avatar Apr 18 '24 00:04 github-actions[bot]

Let me know if there is any other information you would like me to provide.

acircleda avatar Apr 18 '24 21:04 acircleda