openfe Request: AMD HIP Platform Support

trafficstars

Hi all,

I was wondering if is possible to add the support for the AMD HIP platform. Specifically OpenMM already is able to work on HIP platforms installing a specific hack with Conda:

mamba install jaimergp/label/unsupported-cudatoolkit-shim::cudatoolkit=11.2.2 && mamba install streamhpc::openmm-hip=8.0.0

But the problem is that in your code, specifically under openmmutils/utils.py there is an assert that allows only ["OpenCL", "CUDA"], probably the fix is easy since as said OpenMM is able to see HIP as platform as well as give it the "right" speed.

The fix would have a huge impact since the LUMI HPC (most powerful in europe) only supports HIP platform.

Apr 15 '24 09:04 HiteSit

@HiteSit Thank you for raising this issue!

"We" (not the OpenFE team but other orgs I am apart of) are working to get ROCm/HIP onto conda-forge so that no hacks will be needed to install openmm.

Can you link the line of code that has this assert? Also do you know how OpenMM reports the HIP platform string-wise? I would be happy to get this working.

Apr 17 '24 19:04 mikemhenry

So, the utils file is under: /mambaforge/envs/cheminf_3_11/lib/python3.11/site-packages/openmmtools/utils/utils.py

def platform_supports_precision(platform, precision):
    """Determine whether the specified OpenMM Platform supports the specified minimum precision.

    Parameters
    ----------
    platform : str or openmm.Platform
        The platform or platform name to check
    precision : str
        One of ['single', 'mixed', 'double']

    Returns
    -------
    is_supported : bool
        True if the platform supports the specified precision; False otherwise
    """
    SUPPORTED_PRECISIONS = ['single', 'mixed', 'double']
    assert precision in SUPPORTED_PRECISIONS, f"Precision {precision} must be one of {SUPPORTED_PRECISIONS}"

    if isinstance(platform, str):
        # Get the actual Platform object if the platform_name was specified
        platform = openmm.Platform.getPlatformByName(platform)

    if platform.getName() == 'Reference':
        # Reference is double precision
        return (precision == 'double')

    if platform.getName() == 'CPU':
        return precision in ['mixed']

    if platform.getName() in ['CUDA', 'OpenCL']:
        properties = { 'Precision' : precision }
        system = openmm.System()
        system.addParticle(1.0) # Cannot create Context on a system with no particles
        integrator = openmm.VerletIntegrator(0.001)
        try:
            context = openmm.Context(system, integrator, platform, properties)
            del context, integrator
            return True
        except Exception as e:
            return False

    raise Exception(f"Platform {platform.getName()} unknown")

def get_available_platforms(minimum_precision='mixed'):
    """Return a list of the available OpenMM Platforms that can satisfy the requested minimum precision.

    Parameters
    ----------
    minimum_precision : str, optional, default='mixed'
        One of [None, 'single', 'mixed', 'double']
        If None, all available platforms will be returned.

    Returns
    -------
    platforms : list of openmm.Platform
        Platforms that support specified minimumprecision
    """
    platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())]

    if minimum_precision is not None:
        # Filter based on precision support
        platforms = [ platform for platform in platforms if platform_supports_precision(platform, minimum_precision) ]

    return platforms

def get_fastest_platform(minimum_precision='mixed'):
    """Return the fastest available platform.

    This relies on the hardcoded speed values in Platform.getSpeed().

    Parameters
    ----------
    minimum_precision : str, optional, default='mixed'
        One of ['single', 'mixed', 'double']

    Returns
    -------
    platform : openmm.Platform
       The fastest available platform.

    """
    platforms = get_available_platforms(minimum_precision=minimum_precision)
    fastest_platform = max(platforms, key=lambda x: x.getSpeed())
    return fastest_platform

If I run:

platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())]
for platform in platforms:
     name = platform.getName()
     print(name)

>>> Something like "CPU", "OpenCL", "HIP"

Wrong timing, this week LUMI HPC is down for maintenance, I will edit this message with the output from the print. But I'm quite sure that the only problem is that assert. I would like to edit the code by myself and report if just adding to the assert also ["HIP"] would work, but LUMI install Conda (OpenFF, OpenMM, OpenFreeEnergy and so on) using Singularity enviroments (read-only) and I could not figure it out yet how to use it with --sandbox, If I figure it out I will edit this post.

Apr 18 '24 12:04 HiteSit

Support for the HIP platform would be nice, however I would warn that it definitely needs validation prior to use. Untested platforms tend to be prone to odd behaviour in the alchemical world. Validation would require at least an HFE validation test & a couple of RBFE test cases.

Is this something you'd be willing to take on @HiteSit ?

Apr 21 '24 11:04 IAlibay

@IAlibay Surely I would like to contribute. First I have to resolve the problem with the Singularity Enviroment. Beside that I can code quite well but I do not have a enough experience with alchemical transformation so I need to be guided.

Apr 22 '24 11:04 HiteSit

@HiteSit - we need to discuss this internally first, but the requirement here would be mostly to run a suitably large set of alchemical simulations to verify that the results are reasonable. This would mostly require access to suitable AMD HIP compute resources to do such a validation, which unfortunately we do not have :(

Apr 24 '24 19:04 IAlibay

@IAlibay Yes I understood you need access to the platform. I will try to grab some computational time (node/hours) for free, but anyway if you do not need an astonishing amount of computational time (maybe try to give me more or less a range of node/hours) I'm willing to share my computational time without problem. It's my pleasure to contribute.

You can contact me on [email protected]

Apr 25 '24 11:04 HiteSit

@HiteSit - I'm re-opening this issue if it's ok, there definitely needs to be some kind of update to our compute platform selection to allow for HIP. My question was more of a "once this is one, someone will need to check it works".

Apr 25 '24 11:04 IAlibay

@IAlibay I can check, the only constraint is the computational time, I have a limited amount of computational time, but as rule of thumb if the testing is around let's say 10 proteins each of them with 30 ligands should not be a problem. If it's more I can figure it out a way to grab more computational time.

Apr 25 '24 14:04 HiteSit

Thanks for pointing to the code file! I will raise this as a separate issue on the openmmtools side of things since there isn't really any reason why we couldn't add support for HIP there, but as @IAlibay said, when it comes to using it in an openfe workflow, we will need it validated.

Apr 25 '24 22:04 mikemhenry

@HiteSit I've made a branch on openmmtools with the changes that I think are needed. To play around with it, to test it you will need to run:

(you can also use mamba or conda to do this)

# Create an env that has openmm with the hip platform
$ micromamba create -n openmm82beta-openfe -c conda-forge/label/openmm_rc -c conda-forge "openmm-hip==8.2.0beta" "openfe==1.1.0"
# Activate env
$ micromamba activate openmm82beta-openfe
# Install openmmtools branch 
$ pip install git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platform

Can you let me know if that works for you? I don't have an AMD card handy but I can spin up some cloud resources to test if needed.

Oct 04 '24 14:10 mikemhenry

@mikemhenry Sure, give me a couple of days. I will edit this message with the answer.

Oct 07 '24 08:10 HiteSit

Sounds good, my only request is that you make a new message with the answer :) GitHub won't notify me if you edit your message but will if you post a new one, thanks!

Oct 07 '24 20:10 mikemhenry

@mikemhenry

I installed the package as follow

name: openfe
channels:
  - conda-forge
  - defaults
dependencies:
  - conda-forge/label/openmm_rc::openmm-hip=8.2.0beta
  - conda-forge::openfe=1.1.0
  - python=3.10
  - pandas
  - numpy
  - seaborn
  - pip
  - pip:
      - git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platform

conda-containerize new --prefix <install_dir> env.yml

Checking if OpenMM was able to recognise the HIP platform

platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())]
for platform in platforms:
    print(platform.getName())

> Reference
> CPU
> HIP
> OpenCL

The sample run failed causing the death of the kernel

# Create a system for testing
system = mm.System()
# Adding a dummy particle to avoid an empty system error
system.addParticle(1.0 * unit.dalton)

# Select the OpenCL platform
platform = mm.Platform.getPlatformByName('HIP')

# Create an integrator
integrator = mm.LangevinIntegrator(300*unit.kelvin, 1/unit.picosecond, 0.002*unit.picoseconds)

# Create a context to check properties
context = mm.Context(system, integrator, platform)

# Print platform properties
properties = context.getPlatform().getPropertyNames()
for prop in properties:
    value = context.getPlatform().getPropertyValue(context, prop)
    print(f'{prop}: {value}')

> Kernel died

Setting up OpenFE quickrun

def build_kar(ligand_mols, mode):
    # Create an MST network
    mst_network = generate_minimal_spanning_network(
        ligands=ligand_mols,
        scorer=openfe.lomap_scorers.default_lomap_score,
        mappers=[KartografAtomMapper(),])

    mst_edges = [edge for edge in mst_network.edges]

    # Create a lomap network
    mappers = [
        openfe.setup.LomapAtomMapper(
            time=20,
            threed=True,
            max3d=1.0,
            element_change=True,
            seed='[#7]-[#6](=O)-[#6]-[#7]-1-[#6]-[#6]-[#7](-[#6]-[#6]-1)-[#6]=O',
            shift=True,
        ),
    ]

    lomap_network = generate_lomap_network(
        molecules=ligand_mols,
        scorer=openfe.lomap_scorers.default_lomap_score,
        mappers=mappers)

    lomap_edges = [edge for edge in lomap_network.edges]

    if mode == "mst":
        return mst_network, mst_edges

    elif mode == "lomap":
        return lomap_network, lomap_edges

def define_transformation(pdbfile, edge):

    protein = ProteinComponent.from_pdb_file(pdbfile)
    solvent = SolventComponent(positive_ion='Na', negative_ion='Cl',
                               neutralize=True, ion_concentration=0.15*unit.molar)

    one_complex = ChemicalSystem({"ligand": edge.componentA, "solvent": solvent, "protein": protein}, name=edge.componentA.name)
    one_solvent = ChemicalSystem({"ligand": edge.componentA, "solvent": solvent}, name=edge.componentA.name)

    two_complex = ChemicalSystem({"ligand": edge.componentB, "solvent": solvent, "protein": protein}, name=edge.componentB.name)
    two_solvent = ChemicalSystem({"ligand": edge.componentB, "solvent": solvent}, name=edge.componentB.name)

    rbfe_settings = RelativeHybridTopologyProtocol.default_settings()
    rbfe_settings.simulation_settings.equilibration_length = 10 * unit.picosecond
    rbfe_settings.simulation_settings.production_length = 50 * unit.picosecond
    rbfe_settings.engine_settings.compute_platform = "HIP"

    rbfe_protocol = RelativeHybridTopologyProtocol(
        settings=rbfe_settings
    )

    transformation_complex = openfe.Transformation(
                stateA=one_complex,
                stateB=two_complex,
                mapping=edge,
                protocol=rbfe_protocol,
                name=f"{one_complex.name}_{two_complex.name}_complex"
            )
    transformation_solvent = openfe.Transformation(
                stateA=one_solvent,
                stateB=two_solvent,
                mapping=edge,
                protocol=rbfe_protocol,
                name=f"{one_solvent.name}_{two_solvent.name}_solvent"
            )

    trans_list = [transformation_complex, transformation_solvent]

    return trans_list

def set_trans(pdbfile, mst_edge, results_dir):
    trans_lst = define_transformation(pdbfile, mst_edge)

    RUNS = []
    for trans in trans_lst:
        # Basename
        run_basename = trans.name

        # Set the Run Dirs
        run_dir = os.path.join(results_dir, run_basename)
        os.makedirs(run_dir, exist_ok=True)

        # Set the input json_path
        i_json_file = os.path.join(results_dir, f"{run_basename}.json")
        trans.dump(i_json_file)

        # Set the output json_path
        o_json_file = os.path.join(results_dir, f"{run_basename}_RES.json")

        # Create the command string
        RUN = f"openfe quickrun {i_json_file} -o {o_json_file} -d {run_dir}"
        RUNS.append(RUN)

    return RUNS

ligands_sdf = Chem.SDMolSupplier('Aligned_Rdkit_FIX.sdf', removeHs=False)
pdbfile = "./LAC3.pdb"

# Now pass these to form a list of Molecules
ligand_mols = [SmallMoleculeComponent(sdf) for sdf in ligands_sdf]
network, edges = build_kar(ligand_mols, mode="lomap")

all_runs = []
for edge in edges:
    trans = set_trans(pdbfile, edge, results_dir)
    all_runs.extend(trans)

bash_commands = "# Array of commands to execute\ncommands=(\n  "
bash_commands += "\n  ".join(f"'{cmd}'" for cmd in all_runs)
bash_commands += "\n)"

Running OpenFE Quickrun

commands=(
  'openfe quickrun Results/Mol_1_Mol_2_complex.json -o Results/Mol_1_Mol_2_complex_RES.json -d Results/Mol_1_Mol_2_complex'
  'openfe quickrun Results/Mol_1_Mol_2_solvent.json -o Results/Mol_1_Mol_2_solvent_RES.json -d Results/Mol_1_Mol_2_solvent'
  'openfe quickrun Results/Mol_3_Mol_6_complex.json -o Results/Mol_3_Mol_6_complex_RES.json -d Results/Mol_3_Mol_6_complex'
  'openfe quickrun Results/Mol_3_Mol_6_solvent.json -o Results/Mol_3_Mol_6_solvent_RES.json -d Results/Mol_3_Mol_6_solvent'
  'openfe quickrun Results/Mol_5_Mol_7_complex.json -o Results/Mol_5_Mol_7_complex_RES.json -d Results/Mol_5_Mol_7_complex'
  'openfe quickrun Results/Mol_5_Mol_7_solvent.json -o Results/Mol_5_Mol_7_solvent_RES.json -d Results/Mol_5_Mol_7_solvent'
  'openfe quickrun Results/Mol_4_Mol_6_complex.json -o Results/Mol_4_Mol_6_complex_RES.json -d Results/Mol_4_Mol_6_complex'
  'openfe quickrun Results/Mol_4_Mol_6_solvent.json -o Results/Mol_4_Mol_6_solvent_RES.json -d Results/Mol_4_Mol_6_solvent'
  'openfe quickrun Results/Mol_1_Mol_3_complex.json -o Results/Mol_1_Mol_3_complex_RES.json -d Results/Mol_1_Mol_3_complex'
  'openfe quickrun Results/Mol_1_Mol_3_solvent.json -o Results/Mol_1_Mol_3_solvent_RES.json -d Results/Mol_1_Mol_3_solvent'
  'openfe quickrun Results/Mol_0_Mol_5_complex.json -o Results/Mol_0_Mol_5_complex_RES.json -d Results/Mol_0_Mol_5_complex'
  'openfe quickrun Results/Mol_0_Mol_5_solvent.json -o Results/Mol_0_Mol_5_solvent_RES.json -d Results/Mol_0_Mol_5_solvent'
  'openfe quickrun Results/Mol_0_Mol_6_complex.json -o Results/Mol_0_Mol_6_complex_RES.json -d Results/Mol_0_Mol_6_complex'
  'openfe quickrun Results/Mol_0_Mol_6_solvent.json -o Results/Mol_0_Mol_6_solvent_RES.json -d Results/Mol_0_Mol_6_solvent'
  'openfe quickrun Results/Mol_4_Mol_5_complex.json -o Results/Mol_4_Mol_5_complex_RES.json -d Results/Mol_4_Mol_5_complex'
  'openfe quickrun Results/Mol_4_Mol_5_solvent.json -o Results/Mol_4_Mol_5_solvent_RES.json -d Results/Mol_4_Mol_5_solvent'
  'openfe quickrun Results/Mol_2_Mol_3_complex.json -o Results/Mol_2_Mol_3_complex_RES.json -d Results/Mol_2_Mol_3_complex'
  'openfe quickrun Results/Mol_2_Mol_3_solvent.json -o Results/Mol_2_Mol_3_solvent_RES.json -d Results/Mol_2_Mol_3_solvent'
  'openfe quickrun Results/Mol_3_Mol_5_complex.json -o Results/Mol_3_Mol_5_complex_RES.json -d Results/Mol_3_Mol_5_complex'
  'openfe quickrun Results/Mol_3_Mol_5_solvent.json -o Results/Mol_3_Mol_5_solvent_RES.json -d Results/Mol_3_Mol_5_solvent'
)

Got the following error:

Loading file...
Planning simulations for this edge...
Starting the simulations for this edge...
SYSTEM CONFIG DETAILS:
INFO:openfe.utils.system_probe.log:SYSTEM CONFIG DETAILS:
hostname: 'nid005122'
INFO:openfe.utils.system_probe.log.hostname:hostname: 'nid005122'
CUDA-based GPU not found
INFO:openfe.utils.system_probe.log.gpu:CUDA-based GPU not found
Memory used: 26.0G (7.7%)
INFO:openfe.utils.system_probe.log:Memory used: 26.0G (7.7%)
Results/Mol_1_Mol_2_solvent/scratch_RelativeHybridTopologyProtocolUnit-ffc7cefe2c434b1e866ff7487c6bd0c7_attempt_0: 0% full (49.7T free)
INFO:openfe.utils.system_probe.log:Results/Mol_1_Mol_2_solvent/scratch_RelativeHybridTopologyProtocolUnit-ffc7cefe2c434b1e866ff7487c6bd0c7_attempt_0: 0% full (49.7T free)
Preparing the hybrid topology simulation
INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Preparing the hybrid topology simulation
Parameterizing molecules
INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Parameterizing molecules
WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/topologyhelpers.py:705: UserWarning: mapping 47 : 3258 deviates by more than 1.0
  warnings.warn(wmsg)

WARNING:root:mapping 47 : 3258 deviates by more than 1.0
WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/topologyhelpers.py:705: UserWarning: mapping 54 : 3266 deviates by more than 1.0
  warnings.warn(wmsg)

WARNING:root:mapping 54 : 3266 deviates by more than 1.0
Creating hybrid system
INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Creating hybrid system
Setting force field terms
INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Setting force field terms
Adding forces
INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Adding forces
Hybrid system created
INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Hybrid system created
WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/mdtraj/core/topology.py:84: UserWarning: atom_indices are not monotonically increasing
  warnings.warn("atom_indices are not monotonically increasing")

WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/compute.py:56: UserWarning: Non-GPU platform selected: HIP, this may significantly impact simulation performance
  warnings.warn(wmsg)

WARNING:root:Non-GPU platform selected: HIP, this may significantly impact simulation performance
Creating and setting up the sampler
INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Creating and setting up the sampler
:0:/home/conda/feedstock_root/build_artifacts/hip_1718643748184/work/clr/hipamd/src/hiprtc/hiprtcInternal.hpp:105 : 672298286298 us: [pid:76800 tid:0x14eff92b4740] Unable to add internal header
/scratch/project_465000973/Singularity_Envs/openfe_rocm/bin/openfe: line 29: 76278 Aborted                 /usr/bin/singularity --silent exec $DIR/../$CONTAINER_IMAGE bash -c "eval \"\$(/LUMI_TYKKY_1tOKsNy/miniconda/bin/conda shell.bash hook )\"  && conda activate env1 &>/dev/null &&  exec -a $_O_SOURCE $DIR/openfe $( test $# -eq 0 || printf " %q" "$@" )"

Oct 08 '24 10:10 HiteSit

UPDATE

Like this It works

name: openfe
channels:
  - conda-forge
  - defaults
dependencies:
  - jaimergp/label/unsupported-cudatoolkit-shim::cudatoolkit=11.2.2
  - streamhpc::openmm-hip=8.0.0
  - conda-forge::openfe=1.1.0
  - python=3.11
  - pandas
  - numpy
  - seaborn
  - pip
  - pip:
      - git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platform

Oct 08 '24 15:10 HiteSit

WARNING:root:Non-GPU platform selected: HIP, this may significantly impact simulation performance We will need to fix this warning since HIP is a GPU platform

I am guessing the difference is here:

  - streamhpc::openmm-hip=8.0.0

and

  - conda-forge/label/openmm_rc::openmm-hip=8.2.0beta

for why it works or doesn't work

There is an issue here on openmm:

https://github.com/openmm/openmm/issues/4675

where people are discussing some problems with the openmm-hip package from conda-forge

Oct 08 '24 17:10 mikemhenry

openfe openfe copied to clipboard

Request: AMD HIP Platform Support

openfe
openfe copied to clipboard