botorch icon indicating copy to clipboard operation
botorch copied to clipboard

[Feature Request] Hypervolume computation with BoTorch is thousands of times slower than with moocore

Open MLopez-Ibanez opened this issue 2 years ago • 22 comments

🚀 Feature Request

Motivation

https://botorch.org/api/utils.html#botorch.utils.multi_objective.hypervolume.Hypervolume

says: TODO: write this in C++ for faster looping.

There is a C (C++ compatible) implementation available here: https://github.com/MLopez-Ibanez/eaf/blob/master/src/mo-tools/hv.c

MLopez-Ibanez avatar Feb 16 '23 19:02 MLopez-Ibanez

Yeah, that would be nice. Using box decompositions (DominatedPartitioning) to compute hypervolumes seems to work more efficiently than the dimension-sweep algorithm, empirically. Both would likely benefit from C++

sdaulton avatar Feb 16 '23 20:02 sdaulton

We'd also need this to be auto-differentiable in pytorch though. So the idea was to (maybe) write this using the pytorch C++ frontend as an extension.

Balandat avatar Feb 16 '23 22:02 Balandat

The current implementation is not differentiable and is simply used for evaluation purposes.

sdaulton avatar Feb 16 '23 22:02 sdaulton

Oh I see different implementation, my bad.

Balandat avatar Feb 16 '23 22:02 Balandat

Yeah, that would be nice. Using box decompositions (DominatedPartitioning) to compute hypervolumes seems to work more efficiently than the dimension-sweep algorithm, empirically. Both would likely benefit from C++

Maybe. I believe the state-of-the-art is pretty well summarised here: https://dl.acm.org/doi/fullHtml/10.1145/3453474#tab1 And there it recommends the 2D and 3D dimension-sweep algorithms for 2D and 3D. The box-decomposition is only recommended for 5D and 6D.

In any case, I would be surprised if the Python implementation of either algorithm can get anywhere close to the C version of dimension-sweep for 2D and 3D.

It would be nice to have a highly-optimized C/C++ library of fundamental algorithms (not only hypervolume, but dominance filtering, nondominated sorting, other metrics like IGD+ and epsilon, etc) for multi-objective optimization that can be used in R and Python.

MLopez-Ibanez avatar Feb 20 '23 08:02 MLopez-Ibanez

There is a C (C++ compatible) implementation available here: https://github.com/MLopez-Ibanez/eaf/blob/master/src/mo-tools/hv.c

By the way, I'm happy to modify the C code above to make it easier to integrate into BoTorch.

MLopez-Ibanez avatar Feb 20 '23 08:02 MLopez-Ibanez

It would be nice to have a highly-optimized C/C++ library of fundamental algorithms (not only hypervolume, but dominance filtering, nondominated sorting, other metrics like IGD+ and epsilon, etc) for multi-objective optimization that can be used in R and Python.

Indeed, that would be nice.

By the way, I'm happy to modify the C code above to make it easier to integrate into BoTorch.

BoTorch currently doesn't contain any compiled code, which simplifies building and packaging a lot, and so I'd only want to add this to BoTorch if it brings substantial benefits. That said, if this lived in a separate library that is well implemented and maintained I'd be happy to add it as a (potentially optional) dependency. This might more sense especially if the scope for this work is broader and potentially includes more general python and R bindings.

Balandat avatar Feb 20 '23 09:02 Balandat

I did some quick experiments comparing the BoTorch implementation (botorch-0.12.0) of the hypervolume to the one provided by moocore (https://multi-objective.github.io/moocore/python/), which is implemented in C.

Figure_1

It seems BoTorch is several thousands times slower. Maybe I am doing something wrong? I used this code.

import numpy as np
import moocore

from botorch.utils.multi_objective.hypervolume import Hypervolume as botorch_HV
import torch

import timeit

def read_data(name):
    files = {"DTLZLinearShape.3d": "~/work/perfassess/hyperv/uc/test/inp/DTLZLinearShape.3d.front.1000pts.10",
             "DTLZLinearShape.4d" : "~/work/perfassess/hyperv/uc/test/inp/DTLZLinearShape.4d.front.1000pts.10"}
    x = moocore.read_datasets(files[name])[:, :-1]
    x = moocore.filter_dominated(x, maximise=True)
    return x


name = "DTLZLinearShape.3d"
x = read_data(name)

ref = np.zeros(x.shape[1])
botorch_hv = botorch_HV(ref_point=torch.from_numpy(ref))
moocore_hv = moocore.Hypervolume(ref = ref, maximise = True)

n = np.arange(500, 3000 + 1, 500)

botorch_values = []
moocore_values =  []
botorch_times = []
moocore_times =  []
for maxrow in n:
    z = x[:maxrow, :]
    z_torch = torch.from_numpy(z)
    botorch_values += [botorch_hv.compute(z_torch)]
    moocore_values += [moocore_hv(z)]
    duration = timeit.timeit('botorch_hv.compute(z_torch)', globals=globals(), number = 2)  
    botorch_times += [duration]
    duration = timeit.timeit('moocore_hv(z)', globals=globals(), number = 2)
    moocore_times += [duration]
    
botorch_values = np.array(botorch_values)
moocore_values = np.array(moocore_values)
assert np.allclose(botorch_values, moocore_values)

botorch_times = np.array(botorch_times)
moocore_times = np.array(moocore_times)

import pandas as pd
import matplotlib.pyplot as plt
cpu = "Intel i5-6200U 2.30GHz"
df = pd.DataFrame(dict(n = n, Ratio = botorch_times/moocore_times)).set_index('n')
df.plot(grid=True, ylabel='Botorch/moocore (seconds)', title = f'HV computation for {name} ({cpu})')
plt.show()

MLopez-Ibanez avatar Nov 28 '24 17:11 MLopez-Ibanez

I'll let somebody else weigh in on whether this is the right way to benchmark, but for reference, which algorithm are you using for computing HV in moocore?

On Thu, Nov 28, 2024 at 12:34 PM Manuel López-Ibáñez < @.***> wrote:

I did some quick experiments comparing the BoTorch implementation (botorch-0.12.0) of the hypervolume to the one provided by moocore ( https://multi-objective.github.io/moocore/python/), which is implemented in C.

Figure_1.png (view on web) https://github.com/user-attachments/assets/be4d224e-025c-4382-85a0-c97928615b94

It seems BoTorch is several thousands times slower. Maybe I am doing something wrong? I used this code.

import numpy as npimport moocore from botorch.utils.multi_objective.hypervolume import Hypervolume as botorch_HVimport torch import timeit def read_data(name): files = {"DTLZLinearShape.3d": "~/work/perfassess/hyperv/uc/test/inp/DTLZLinearShape.3d.front.1000pts.10", "DTLZLinearShape.4d" : "~/work/perfassess/hyperv/uc/test/inp/DTLZLinearShape.4d.front.1000pts.10"} x = moocore.read_datasets(files[name])[:, :-1] x = moocore.filter_dominated(x, maximise=True) return x

name = "DTLZLinearShape.3d"x = read_data(name) ref = np.zeros(x.shape[1])botorch_hv = botorch_HV(ref_point=torch.from_numpy(ref))moocore_hv = moocore.Hypervolume(ref = ref, maximise = True) n = np.arange(500, 3000 + 1, 500) botorch_values = []moocore_values = []botorch_times = []moocore_times = []for maxrow in n: z = x[:maxrow, :] z_torch = torch.from_numpy(z) botorch_values += [botorch_hv.compute(z_torch)] moocore_values += [moocore_hv(z)] duration = timeit.timeit('botorch_hv.compute(z_torch)', globals=globals(), number = 2) botorch_times += [duration] duration = timeit.timeit('moocore_hv(z)', globals=globals(), number = 2) moocore_times += [duration] botorch_values = np.array(botorch_values)moocore_values = np.array(moocore_values)assert np.allclose(botorch_values, moocore_values) botorch_times = np.array(botorch_times)moocore_times = np.array(moocore_times) import pandas as pdimport matplotlib.pyplot as pltcpu = "Intel i5-6200U 2.30GHz"df = pd.DataFrame(dict(n = n, Ratio = botorch_times/moocore_times)).set_index('n')df.plot(grid=True, ylabel='Botorch/moocore (seconds)', title = f'HV computation for {name} ({cpu})')plt.show()

— Reply to this email directly, view it on GitHub https://github.com/pytorch/botorch/issues/1685#issuecomment-2506562549, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAW34KYSH52YGJX6OHB6D32C5HZFAVCNFSM6AAAAABSVTGDASVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBWGU3DENJUHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

eytan avatar Nov 28 '24 17:11 eytan

Documented here: https://multi-objective.github.io/moocore/python/reference/generated/moocore.hypervolume.html#moocore.hypervolume

MLopez-Ibanez avatar Nov 28 '24 19:11 MLopez-Ibanez

I haven't looked into the details, but I wouldn't be surprised if the implementation of this particular algorithm were indeed many orders of magnitude slower. This algorithm is very loopy so this is going to be super slow in python to begin with. On top of that this is done in pytorch so there is probably a lot of overhead from building the (very complex) compute graph in the background. This algorithm in botorch really shouldn't be used (and it's not used in our multi-objective acquisition functions). We should probably make this more clear in the docs and potentially just remove this algorithm.

Balandat avatar Nov 28 '24 19:11 Balandat

It would be nice to have a highly-optimized C/C++ library of fundamental algorithms (not only hypervolume, but dominance filtering, nondominated sorting, other metrics like IGD+ and epsilon, etc) for multi-objective optimization that can be used in R and Python.

Indeed, that would be nice.

Well, that is what moocore is meant to be.

MLopez-Ibanez avatar Dec 03 '24 12:12 MLopez-Ibanez

https://botorch.org/api/utils.html#botorch.utils.multi_objective.pareto.is_non_dominated

versus

https://multi-objective.github.io/moocore/python/reference/generated/moocore.is_nondominated.html

Figure_1

import numpy as np
import moocore
import torch
import timeit

from botorch.utils.multi_objective.pareto import is_non_dominated as botorch_is_non_dominated

name = "CPFs.txt"
x = moocore.get_dataset(name)[:, :-1]
botorch_times = []
moocore_times =  []
n = np.arange(100, len(x) + 1, 250)
botorch_times = []
moocore_times =  []
for maxrow in n:
    z = x[:maxrow, :]
    z_torch = torch.from_numpy(z)
    botorch_values = botorch_is_non_dominated(z_torch)
    duration = timeit.timeit('botorch_is_non_dominated(z_torch)', globals=globals(), number = 5)
    botorch_times += [duration]
    moocore_values = moocore.is_nondominated(z, maximise=True)
    duration = timeit.timeit('moocore.is_nondominated(z, maximise=True)', globals=globals(), number = 5)
    moocore_times += [duration]
    assert np.allclose(botorch_values,moocore_values)

botorch_times = np.array(botorch_times)
moocore_times = np.array(moocore_times)

import pandas as pd
import matplotlib.pyplot as plt
cpu = "Intel i5-6200U 2.30GHz"
df = pd.DataFrame(dict(n = n, BoTorch = botorch_times, moocore = moocore_times)).set_index('n')
df.plot(grid=True, ylabel='CPU time (seconds)', title = f'is_non_dominated() for {name} ({cpu})')
plt.yscale("log")
plt.show()

MLopez-Ibanez avatar Dec 03 '24 12:12 MLopez-Ibanez

I'm curious about what difference in timing it makes if you set *deduplicate=Falseinis_non_dominated`.

Balandat avatar Dec 03 '24 14:12 Balandat

I'm curious about what difference in timing it makes if you set *deduplicate=Falseinis_non_dominated`.

Both seem to be slightly slower.

Figure_2

import numpy as np
import moocore
import torch
import timeit
from botorch.utils.multi_objective.pareto import is_non_dominated as botorch_is_non_dominated

name = "CPFs.txt"
x = moocore.get_dataset(name)[:, :-1]
botorch_times = []
moocore_times =  []
n = np.arange(100, len(x) + 1, 250)
botorch_times = []
moocore_times =  []
for maxrow in n:
    z = x[:maxrow, :]
    z_torch = torch.from_numpy(z)
    botorch_values = botorch_is_non_dominated(z_torch,deduplicate=False)
    duration = timeit.timeit('botorch_is_non_dominated(z_torch,deduplicate=False)', globals=globals(), number = 10)
    botorch_times += [duration]
    moocore_values = moocore.is_nondominated(z, maximise=True, keep_weakly = True)
    duration = timeit.timeit('moocore.is_nondominated(z, maximise=True, keep_weakly = True)', globals=globals(), number = 10)
    moocore_times += [duration]
    assert np.allclose(botorch_values,moocore_values)

botorch_times = np.array(botorch_times)
moocore_times = np.array(moocore_times)

import pandas as pd
import matplotlib.pyplot as plt
cpu = "Intel i5-6200U 2.30GHz"
df = pd.DataFrame(dict(n = n, BoTorch = botorch_times, moocore = moocore_times)).set_index('n')
df.plot(grid=True, ylabel='CPU time (seconds)', title = f'is_non_dominated(deduplicate=False) for {name} ({cpu})')
plt.yscale("log")
plt.show()

MLopez-Ibanez avatar Dec 03 '24 16:12 MLopez-Ibanez

I added this rule to is_non_dominated to avoid memory issues and very long run-times on cpu. It looks like that slows things down quite a bit on a GPU. I ran your benchmark on an A100 and the botorch is_non_dominated quite fast on a GPU once we remove that rule (orange).

import timeit
import torch.utils.benchmark as benchmark
import numpy as np
import torch
from botorch.utils.multi_objective.pareto import _is_non_dominated_loop, MAX_BYTES
from torch import Tensor


def is_non_dominated(
    Y: Tensor,
    maximize: bool = True,
    deduplicate: bool = True,
    force_no_loop: bool = True,
) -> Tensor:
    n = Y.shape[-2]
    if n == 0:
        return torch.zeros(Y.shape[:-1], dtype=torch.bool, device=Y.device)
    el_size = 64 if Y.dtype == torch.double else 32
    if (not force_no_loop) and (
        n > 1000 or n**2 * Y.shape[:-2].numel() * el_size / 8 > MAX_BYTES
    ):
        return _is_non_dominated_loop(Y, maximize=maximize, deduplicate=deduplicate)

    Y1 = Y.unsqueeze(-3)
    Y2 = Y.unsqueeze(-2)
    if maximize:
        dominates = (Y1 >= Y2).all(dim=-1) & (Y1 > Y2).any(dim=-1)
    else:
        dominates = (Y1 <= Y2).all(dim=-1) & (Y1 < Y2).any(dim=-1)
    nd_mask = ~(dominates.any(dim=-1))
    if deduplicate:
        # remove duplicates
        # find index of first occurrence  of each unique element
        indices = (Y1 == Y2).all(dim=-1).long().argmax(dim=-1)
        keep = torch.zeros_like(nd_mask)
        keep.scatter_(dim=-1, index=indices, value=1.0)
        return nd_mask & keep
    return nd_mask


def get_duration(z, cuda, force_no_loop):
    z_torch = torch.tensor(z, device=torch.device("cuda" if cuda else "cpu"))
    is_non_dominated(z_torch, deduplicate=False, force_no_loop=force_no_loop)
    t0 = benchmark.Timer(
        stmt='is_non_dominated(z_torch, deduplicate=False, force_no_loop=force_no_loop)',
        globals={'z_torch': z_torch, 'force_no_loop':force_no_loop, "is_non_dominated": is_non_dominated},
    )
    return t0.timeit(10).mean



name = "|"
n = np.arange(100, len(x) + 1, 250)
botorch_cpu_no_loop_times = []
botorch_cpu_times = []
botorch_gpu_no_loop_times = []
botorch_gpu_times = []


for maxrow in n:
    z = x[:maxrow, :]
    botorch_cpu_times.append(get_duration(z, cuda=False, force_no_loop=False))
    botorch_cpu_no_loop_times.append(get_duration(z, cuda=False, force_no_loop=True))
    botorch_gpu_times.append(get_duration(z, cuda=True, force_no_loop=False))
    botorch_gpu_no_loop_times.append(get_duration(z, cuda=True, force_no_loop=True))
botorch_cpu_times = np.array(botorch_cpu_times)
botorch_cpu_no_loop_times = np.array(botorch_cpu_no_loop_times)
botorch_gpu_times = np.array(botorch_gpu_times)
botorch_gpu_no_loop_times = np.array(botorch_gpu_no_loop_times)

import matplotlib.pyplot as plt
import pandas as pd

cpu = "GPU"
df = pd.DataFrame(
    dict(
        n=n,
        botorch_cpu=botorch_cpu_times,
        botorch_cpu_no_loop=botorch_cpu_no_loop_times,
        botorch_gpu=botorch_gpu_times,
        botorch_gpu_no_loop=botorch_gpu_no_loop_times,
    )
).set_index("n")
df.plot(
    grid=True,
    ylabel="Time (seconds)",
    title=f"is_non_dominated(deduplicate=False)",
)
plt.yscale("log")
plt.show()

download - 2024-12-04T150704 441

sdaulton avatar Dec 05 '24 00:12 sdaulton

I added this rule to is_non_dominated to avoid memory issues and very long run-times on cpu. It looks like that slows things down quite a bit on a GPU. I ran your benchmark on an A100 and the botorch is_non_dominated quite fast on a GPU once we remove that rule (orange).

How does it compare with moocore.is_nondominated ? Note that the dataset that I used comes with the moocore package as well (it is documented here, I am not sure how to document datasets in Python), so you can use that one if you wish to (I don't see how x is defined in your code).

MLopez-Ibanez avatar Dec 05 '24 14:12 MLopez-Ibanez

Hi Manuel,

Thanks for raising this performance issue, but it would be helpful if you could clarify your intent here. This method is used internally for our (differentiable) HV computations. It’s written in pure PyTorch because we do not wish to have any C/C++ code or external dependencies since that limits our user base and is frankly quite cumbersome to deal with internally.

If you have an algorithm or way of implementing things that you believe is more performant and would like to contribute to our project, we’d really appreciate any PRs. The performance of moocore is quite impressive and we’d surely benefit from your expertise here.

Kind regards, Eytan

On Thu, Dec 5, 2024 at 9:26 AM Manuel López-Ibáñez @.***> wrote:

I added this rule to is_non_dominated https://github.com/pytorch/botorch/blob/5012fe8a39b434e1b0f3d3a968eb17b3dd0c9e27/botorch/utils/multi_objective/pareto.py#L47-L48 to avoid memory issues and very long run-times on cpu. It looks like that slows things down quite a bit on a GPU. I ran your benchmark on an A100 and the botorch is_non_dominated quite fast on a GPU once we remove that rule (orange).

How does it compare with moocore.is_nondominated https://multi-objective.github.io/moocore/python/reference/generated/moocore.is_nondominated.html#moocore.is_nondominated ? Note that the dataset that I used comes with the moocore package as well (it is documented here https://multi-objective.github.io/moocore/r/reference/CPFs.html, I am not sure how to document datasets in Python), so you can use that one if you wish to (I don't see how x is defined in your code).

— Reply to this email directly, view it on GitHub https://github.com/pytorch/botorch/issues/1685#issuecomment-2520470725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAW34OR7X3P466VJUPU2YL2EBO77AVCNFSM6AAAAABSVTGDASVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRQGQ3TANZSGU . You are receiving this because you commented.Message ID: @.***>

eytan avatar Dec 05 '24 14:12 eytan

Hi Eytan,

For now my intent is to be sure that what I am observing is a real difference in performance and not an error on my side. I was not expecting such a huge difference!

Ultimately my intent is to provide a C/C++ library, wrapped by high-quality Python and R (and possibly other languages) packages, of high-performing fundamental algorithms (not only hypervolume computation, but dominance filtering, nondominated sorting, other metrics like IGD+ and epsilon, etc). I am frustrated by the lack of good implementations of such fundamental algorithms in Python and R that lead to wasted time and CPU cycles and myths like "hypervolume can only be used with very few points".

It is not my intent to convince you to use moocore as it is clearly not designed with GPUs in mind (at least not yet, maybe in the future, if I get some help or funding...). Nevertheless, knowing when functions in botorch are faster/slower than the equivalent ones in moocore is useful, if I have to advise colleagues or students when to use what option. I may create an example for the documentation of moocore illustrating the differences.

The hypervolume function in botorch is specially bad, since it is clearly too slow (and you have a TODO note there suggesting to use C++). You may either want to remove the function and/or point to better alternatives like moocore or pymoo (although pymoo is still hundreds of times slower than moocore and it is my intention to convince them to switch to moocore).

The hypervolume algorithms for 2D and 3D in moocore are optimal, and I don't think the box-decomposition is worth it with fewer than 5 objectives (happy to be proven otherwise if someone wants to try), so one can only tweak the implementation to save microseconds here and there (or use vectorization/GPUs, but that is currently not in scope for moocore as mentioned above).

I hope the above clarifies my intent!

Kind regards,

Manuel.

MLopez-Ibanez avatar Dec 05 '24 15:12 MLopez-Ibanez

Note that the dataset that I used comes with the moocore package as well (it is documented here, I am not sure how to document datasets in Python), so you can use that one if you wish to (I don't see how x is defined in your code).

I used the same dataset. I just copied it, since I didn't have moocore installed as a dependency.

sdaulton avatar Dec 05 '24 16:12 sdaulton

How does it compare with moocore.is_nondominated ?

I didn't explicitly test moocore, since I didn't have it installed, but I would imagine they are at least competitive on a GPU. It looks like there are also some differences in the methods (e.g. it appears moocore is only looking for weakly pareto optimal points, but I would have to take a closer look). It could be that we should be using a faster algorithm on CPU as well.

sdaulton avatar Dec 05 '24 16:12 sdaulton

How does it compare with moocore.is_nondominated ?

I didn't explicitly test moocore, since I didn't have it installed, but I would imagine they are at least competitive on a GPU. It looks like there are also some differences in the methods (e.g. it appears moocore is only looking for weakly pareto optimal points, but I would have to take a closer look). It could be that we should be using a faster algorithm on CPU as well.

The default is to minimise all objectives and remove weakly dominated but you can control that with the arguments maximise=True and keep_weakly=True. In my tests above, I made sure that the output of the functions is identical using assert np.allclose(botorch_values,moocore_values).

MLopez-Ibanez avatar Dec 05 '24 18:12 MLopez-Ibanez