ray icon indicating copy to clipboard operation
ray copied to clipboard

[tune] `progress_reporter.py` is messy and should be cleaned up

Open krfricke opened this issue 2 years ago • 6 comments

The implementation of the progress reporter, particularly the table generation, suffers from cluttered legacy code. The functions are long, messy, and pass around long argument lists. It is hard to unit test. We should clean this module up do make it easier to extend and modify.

krfricke avatar May 09 '22 14:05 krfricke

I'll take a stab at cleaning this up!

julietnasar avatar May 11 '22 22:05 julietnasar

Thanks @julietnasar that would be great!

krfricke avatar May 12 '22 11:05 krfricke

will working on this issue. cc krfricke

kyle-chen-uber avatar Sep 19 '22 20:09 kyle-chen-uber

@kyle-chen-uber here is a good example for a single Ray Train run: https://docs.ray.io/en/latest/train/train.html#quick-start

import ray
from ray.train.xgboost import XGBoostTrainer
from ray.air.config import ScalingConfig

# Load data.
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")

# Split data into train and validation.
train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)

trainer = XGBoostTrainer(
    scaling_config=ScalingConfig(
        # Number of workers to use for data parallelism.
        num_workers=2,
        # Whether to use GPU acceleration.
        use_gpu=False,
    ),
    label_column="target",
    num_boost_round=20,
    params={
        # XGBoost specific params
        "objective": "binary:logistic",
        # "tree_method": "gpu_hist",  # uncomment this to use GPU for training
        "eval_metric": ["logloss", "error"],
    },
    datasets={"train": train_dataset, "valid": valid_dataset},
)
result = trainer.fit()
print(result.metrics)

krfricke avatar Sep 19 '22 20:09 krfricke

based on discussion with @krfricke some of the initial thoughts is we have a base_progress_reporter, and tune extend the base_progress_reporter, train extend base_progress_reporter, etc. General idea is to clean the abstractions and can easily extend the reporter.

todos:

  • qi to run the tune using reporter test, and see how current print out for tune looks like
  • @krfricke to provide a ray train using reporter code snippet, so we can see how current print out for train looks like

kyle-chen-uber avatar Sep 19 '22 20:09 kyle-chen-uber

Yes! To be clear, the ProgressReporter base class currently calls a lot of functions, passing lots of context, and it's not possible to inject custom logic there. The functions that build the tables are very convoluted and hard to read or extend.

We should find a nice way of abstractions and APIs that can be used in the future to extend these reporter classes.

krfricke avatar Sep 20 '22 08:09 krfricke

See also https://github.com/ray-project/ray/issues/27011 - it would be good to make this easily configurable or be able to add default replacements (e.g. in Ray Train we probably don't want to display train_loop_config at all)

krfricke avatar Sep 26 '22 09:09 krfricke

thx @krfricke i was able to set up local dev and reproduce the the single Ray Train run you provided. https://docs.ray.io/en/latest/train/train.html#quick-start

Since this requires some design on the interface. I will first send out a small pr this week to address the non-necessary print-out for single training flow. For the design, i will target for starting next week/later this week.

kyle-chen-uber avatar Sep 26 '22 20:09 kyle-chen-uber

Ray Train doesn't display train_loop_config now right? Since train_loop_config is only passed in to Tuner api. @krfricke let me know if i miss something?

kyle-chen-uber avatar Sep 28 '22 16:09 kyle-chen-uber

@krfricke i chatted with @kyle-chen-uber and based on bandwidth, he's targeting 11/18 to wrap up this PR, would that work with your timeline? thanks!

heyitsmui avatar Oct 24 '22 23:10 heyitsmui

this line seems not working on the given test code snippet. Before it works. @krfricke do you know how to fix it or is there any other toy data i can play with? Thx

Load data.

dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")

kyle-chen-uber avatar Nov 14 '22 02:11 kyle-chen-uber