rl4co icon indicating copy to clipboard operation
rl4co copied to clipboard

Error while trying to train multiple models for TSP of size 20 and 50.

Open ujjwaldasari10 opened this issue 1 year ago • 3 comments

Describe the bug

I used the quickstart example to train TSP of size 20 in a conda environment on a gpu connected via SSH in VS code Jupyter notebook. But when I am trying to train a new TSP model of size 50 in the same folder from scratch and trying to load the render the dataset , I am still getting dataset of size 20 instead of 50. The model trains fine but even the below lines seem to only generate Dataset of size 20 instead of 50:

env = TSPEnv( num_loc = 50) new_dataset = env.dataset(10000) dataloader = model._dataloader(new_dataset, batch_size=100) print(new_dataset[0]["locs"].shape)

Output: Unused keyword arguments: num_loc. Please check the documentation for the correct keyword arguments torch.Size([20, 2])

I tried running a new jupyter notebook of size 50 in a new folder but the problem still persists. Can you please helop me identify the issue?

ujjwaldasari10 avatar Aug 05 '24 02:08 ujjwaldasari10

The reason is that we changed the API: now we have a separate Generator class that handles the data generation. So you may call it like this:

env = TSPEnv(generator_params={'num_loc': 50})

(example here)

fedebotu avatar Aug 05 '24 05:08 fedebotu

Thank you. I have one more naive question possibly. Is it possible to train the models on multiple gpus?

ujjwaldasari10 avatar Aug 05 '24 08:08 ujjwaldasari10

Yes. You could set the trining with multi GPUs by editing the parameter trainer.devices.

For example, if you want to try the example experiment in the README (AM on TSP) with multiple GPUs, you can launch the training with

python run.py +trainer.devices="[0, 1]"

This training will use coda:0 and coda:1. And you can set it to more devices if you want.

Here are some useful materials you may want to refer to:

  1. RL4CO tutorial about trainer.
  2. Hydra config tutorial.

cbhua avatar Aug 05 '24 08:08 cbhua

Do you guys plan on including MCTS in the list of decoding strategies in the near future?

ujjwaldasari10 avatar Aug 27 '24 06:08 ujjwaldasari10

@ujjwaldasari10 We are not planning to in the near future since MCTS has not been applied much in the NCO literature, such as routing and scheduling. I remember some work that did, but other methods outperformed it without MCTS.

I think that is an interesting direction, and we gladly accept contributions! If you are interested, I invite you to give it a shot :) What kind of problem would you like to use it for?

fedebotu avatar Aug 27 '24 11:08 fedebotu

Note: closing issue now, @ujjwaldasari10 about MCTS, if you are interested feel free to open a new discussion here or contact us on Slack :rocket:

fedebotu avatar Sep 02 '24 13:09 fedebotu