openfe icon indicating copy to clipboard operation
openfe copied to clipboard

Request: AMD HIP Platform Support

Open HiteSit opened this issue 1 year ago • 15 comments
trafficstars

Hi all,

I was wondering if is possible to add the support for the AMD HIP platform. Specifically OpenMM already is able to work on HIP platforms installing a specific hack with Conda:

mamba install jaimergp/label/unsupported-cudatoolkit-shim::cudatoolkit=11.2.2 && mamba install streamhpc::openmm-hip=8.0.0

But the problem is that in your code, specifically under openmmutils/utils.py there is an assert that allows only ["OpenCL", "CUDA"], probably the fix is easy since as said OpenMM is able to see HIP as platform as well as give it the "right" speed.

The fix would have a huge impact since the LUMI HPC (most powerful in europe) only supports HIP platform.

HiteSit avatar Apr 15 '24 09:04 HiteSit

@HiteSit Thank you for raising this issue!

"We" (not the OpenFE team but other orgs I am apart of) are working to get ROCm/HIP onto conda-forge so that no hacks will be needed to install openmm.

Can you link the line of code that has this assert? Also do you know how OpenMM reports the HIP platform string-wise? I would be happy to get this working.

mikemhenry avatar Apr 17 '24 19:04 mikemhenry

So, the utils file is under: /mambaforge/envs/cheminf_3_11/lib/python3.11/site-packages/openmmtools/utils/utils.py

def platform_supports_precision(platform, precision):
    """Determine whether the specified OpenMM Platform supports the specified minimum precision.

    Parameters
    ----------
    platform : str or openmm.Platform
        The platform or platform name to check
    precision : str
        One of ['single', 'mixed', 'double']

    Returns
    -------
    is_supported : bool
        True if the platform supports the specified precision; False otherwise
    """
    SUPPORTED_PRECISIONS = ['single', 'mixed', 'double']
    assert precision in SUPPORTED_PRECISIONS, f"Precision {precision} must be one of {SUPPORTED_PRECISIONS}"

    if isinstance(platform, str):
        # Get the actual Platform object if the platform_name was specified
        platform = openmm.Platform.getPlatformByName(platform)

    if platform.getName() == 'Reference':
        # Reference is double precision
        return (precision == 'double')

    if platform.getName() == 'CPU':
        return precision in ['mixed']

    if platform.getName() in ['CUDA', 'OpenCL']:
        properties = { 'Precision' : precision }
        system = openmm.System()
        system.addParticle(1.0) # Cannot create Context on a system with no particles
        integrator = openmm.VerletIntegrator(0.001)
        try:
            context = openmm.Context(system, integrator, platform, properties)
            del context, integrator
            return True
        except Exception as e:
            return False

    raise Exception(f"Platform {platform.getName()} unknown")

def get_available_platforms(minimum_precision='mixed'):
    """Return a list of the available OpenMM Platforms that can satisfy the requested minimum precision.

    Parameters
    ----------
    minimum_precision : str, optional, default='mixed'
        One of [None, 'single', 'mixed', 'double']
        If None, all available platforms will be returned.

    Returns
    -------
    platforms : list of openmm.Platform
        Platforms that support specified minimumprecision
    """
    platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())]

    if minimum_precision is not None:
        # Filter based on precision support
        platforms = [ platform for platform in platforms if platform_supports_precision(platform, minimum_precision) ]

    return platforms

def get_fastest_platform(minimum_precision='mixed'):
    """Return the fastest available platform.

    This relies on the hardcoded speed values in Platform.getSpeed().

    Parameters
    ----------
    minimum_precision : str, optional, default='mixed'
        One of ['single', 'mixed', 'double']

    Returns
    -------
    platform : openmm.Platform
       The fastest available platform.

    """
    platforms = get_available_platforms(minimum_precision=minimum_precision)
    fastest_platform = max(platforms, key=lambda x: x.getSpeed())
    return fastest_platform

If I run:

platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())]
for platform in platforms:
     name = platform.getName()
     print(name)

>>> Something like "CPU", "OpenCL", "HIP"

Wrong timing, this week LUMI HPC is down for maintenance, I will edit this message with the output from the print. But I'm quite sure that the only problem is that assert. I would like to edit the code by myself and report if just adding to the assert also ["HIP"] would work, but LUMI install Conda (OpenFF, OpenMM, OpenFreeEnergy and so on) using Singularity enviroments (read-only) and I could not figure it out yet how to use it with --sandbox, If I figure it out I will edit this post.

HiteSit avatar Apr 18 '24 12:04 HiteSit

Support for the HIP platform would be nice, however I would warn that it definitely needs validation prior to use. Untested platforms tend to be prone to odd behaviour in the alchemical world. Validation would require at least an HFE validation test & a couple of RBFE test cases.

Is this something you'd be willing to take on @HiteSit ?

IAlibay avatar Apr 21 '24 11:04 IAlibay

@IAlibay Surely I would like to contribute. First I have to resolve the problem with the Singularity Enviroment. Beside that I can code quite well but I do not have a enough experience with alchemical transformation so I need to be guided.

HiteSit avatar Apr 22 '24 11:04 HiteSit

@HiteSit - we need to discuss this internally first, but the requirement here would be mostly to run a suitably large set of alchemical simulations to verify that the results are reasonable. This would mostly require access to suitable AMD HIP compute resources to do such a validation, which unfortunately we do not have :(

IAlibay avatar Apr 24 '24 19:04 IAlibay

@IAlibay Yes I understood you need access to the platform. I will try to grab some computational time (node/hours) for free, but anyway if you do not need an astonishing amount of computational time (maybe try to give me more or less a range of node/hours) I'm willing to share my computational time without problem. It's my pleasure to contribute.

You can contact me on [email protected]

HiteSit avatar Apr 25 '24 11:04 HiteSit

@HiteSit - I'm re-opening this issue if it's ok, there definitely needs to be some kind of update to our compute platform selection to allow for HIP. My question was more of a "once this is one, someone will need to check it works".

IAlibay avatar Apr 25 '24 11:04 IAlibay

@IAlibay I can check, the only constraint is the computational time, I have a limited amount of computational time, but as rule of thumb if the testing is around let's say 10 proteins each of them with 30 ligands should not be a problem. If it's more I can figure it out a way to grab more computational time.

HiteSit avatar Apr 25 '24 14:04 HiteSit

Thanks for pointing to the code file! I will raise this as a separate issue on the openmmtools side of things since there isn't really any reason why we couldn't add support for HIP there, but as @IAlibay said, when it comes to using it in an openfe workflow, we will need it validated.

mikemhenry avatar Apr 25 '24 22:04 mikemhenry

@HiteSit I've made a branch on openmmtools with the changes that I think are needed. To play around with it, to test it you will need to run:

(you can also use mamba or conda to do this)

# Create an env that has openmm with the hip platform
$ micromamba create -n openmm82beta-openfe -c conda-forge/label/openmm_rc -c conda-forge "openmm-hip==8.2.0beta" "openfe==1.1.0"
# Activate env
$ micromamba activate openmm82beta-openfe
# Install openmmtools branch 
$ pip install git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platform

Can you let me know if that works for you? I don't have an AMD card handy but I can spin up some cloud resources to test if needed.

mikemhenry avatar Oct 04 '24 14:10 mikemhenry

@mikemhenry Sure, give me a couple of days. I will edit this message with the answer.

HiteSit avatar Oct 07 '24 08:10 HiteSit

Sounds good, my only request is that you make a new message with the answer :) GitHub won't notify me if you edit your message but will if you post a new one, thanks!

mikemhenry avatar Oct 07 '24 20:10 mikemhenry

@mikemhenry

  • I installed the package as follow

    name: openfe
    channels:
      - conda-forge
      - defaults
    dependencies:
      - conda-forge/label/openmm_rc::openmm-hip=8.2.0beta
      - conda-forge::openfe=1.1.0
      - python=3.10
      - pandas
      - numpy
      - seaborn
      - pip
      - pip:
          - git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platform
    

    conda-containerize new --prefix <install_dir> env.yml

  • Checking if OpenMM was able to recognise the HIP platform

    platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())]
    for platform in platforms:
        print(platform.getName())
    
    > Reference
    > CPU
    > HIP
    > OpenCL
    
  • The sample run failed causing the death of the kernel

    # Create a system for testing
    system = mm.System()
    # Adding a dummy particle to avoid an empty system error
    system.addParticle(1.0 * unit.dalton)
    
    # Select the OpenCL platform
    platform = mm.Platform.getPlatformByName('HIP')
    
    # Create an integrator
    integrator = mm.LangevinIntegrator(300*unit.kelvin, 1/unit.picosecond, 0.002*unit.picoseconds)
    
    # Create a context to check properties
    context = mm.Context(system, integrator, platform)
    
    # Print platform properties
    properties = context.getPlatform().getPropertyNames()
    for prop in properties:
        value = context.getPlatform().getPropertyValue(context, prop)
        print(f'{prop}: {value}')
    
    > Kernel died
    
  • Setting up OpenFE quickrun

    def build_kar(ligand_mols, mode):
        # Create an MST network
        mst_network = generate_minimal_spanning_network(
            ligands=ligand_mols,
            scorer=openfe.lomap_scorers.default_lomap_score,
            mappers=[KartografAtomMapper(),])
    
        mst_edges = [edge for edge in mst_network.edges]
    
        # Create a lomap network
        mappers = [
            openfe.setup.LomapAtomMapper(
                time=20,
                threed=True,
                max3d=1.0,
                element_change=True,
                seed='[#7]-[#6](=O)-[#6]-[#7]-1-[#6]-[#6]-[#7](-[#6]-[#6]-1)-[#6]=O',
                shift=True,
            ),
        ]
    
        lomap_network = generate_lomap_network(
            molecules=ligand_mols,
            scorer=openfe.lomap_scorers.default_lomap_score,
            mappers=mappers)
    
        lomap_edges = [edge for edge in lomap_network.edges]
    
        if mode == "mst":
            return mst_network, mst_edges
    
        elif mode == "lomap":
            return lomap_network, lomap_edges
    
    def define_transformation(pdbfile, edge):
    
        protein = ProteinComponent.from_pdb_file(pdbfile)
        solvent = SolventComponent(positive_ion='Na', negative_ion='Cl',
                                   neutralize=True, ion_concentration=0.15*unit.molar)
    
        one_complex = ChemicalSystem({"ligand": edge.componentA, "solvent": solvent, "protein": protein}, name=edge.componentA.name)
        one_solvent = ChemicalSystem({"ligand": edge.componentA, "solvent": solvent}, name=edge.componentA.name)
    
        two_complex = ChemicalSystem({"ligand": edge.componentB, "solvent": solvent, "protein": protein}, name=edge.componentB.name)
        two_solvent = ChemicalSystem({"ligand": edge.componentB, "solvent": solvent}, name=edge.componentB.name)
    
        rbfe_settings = RelativeHybridTopologyProtocol.default_settings()
        rbfe_settings.simulation_settings.equilibration_length = 10 * unit.picosecond
        rbfe_settings.simulation_settings.production_length = 50 * unit.picosecond
        rbfe_settings.engine_settings.compute_platform = "HIP"
    
        rbfe_protocol = RelativeHybridTopologyProtocol(
            settings=rbfe_settings
        )
    
        transformation_complex = openfe.Transformation(
                    stateA=one_complex,
                    stateB=two_complex,
                    mapping=edge,
                    protocol=rbfe_protocol,
                    name=f"{one_complex.name}_{two_complex.name}_complex"
                )
        transformation_solvent = openfe.Transformation(
                    stateA=one_solvent,
                    stateB=two_solvent,
                    mapping=edge,
                    protocol=rbfe_protocol,
                    name=f"{one_solvent.name}_{two_solvent.name}_solvent"
                )
    
        trans_list = [transformation_complex, transformation_solvent]
    
        return trans_list
    
    def set_trans(pdbfile, mst_edge, results_dir):
        trans_lst = define_transformation(pdbfile, mst_edge)
    
        RUNS = []
        for trans in trans_lst:
            # Basename
            run_basename = trans.name
    
            # Set the Run Dirs
            run_dir = os.path.join(results_dir, run_basename)
            os.makedirs(run_dir, exist_ok=True)
    
            # Set the input json_path
            i_json_file = os.path.join(results_dir, f"{run_basename}.json")
            trans.dump(i_json_file)
    
            # Set the output json_path
            o_json_file = os.path.join(results_dir, f"{run_basename}_RES.json")
    
            # Create the command string
            RUN = f"openfe quickrun {i_json_file} -o {o_json_file} -d {run_dir}"
            RUNS.append(RUN)
    
        return RUNS
    
    ligands_sdf = Chem.SDMolSupplier('Aligned_Rdkit_FIX.sdf', removeHs=False)
    pdbfile = "./LAC3.pdb"
    
    # Now pass these to form a list of Molecules
    ligand_mols = [SmallMoleculeComponent(sdf) for sdf in ligands_sdf]
    network, edges = build_kar(ligand_mols, mode="lomap")
    
    all_runs = []
    for edge in edges:
        trans = set_trans(pdbfile, edge, results_dir)
        all_runs.extend(trans)
    
    bash_commands = "# Array of commands to execute\ncommands=(\n  "
    bash_commands += "\n  ".join(f"'{cmd}'" for cmd in all_runs)
    bash_commands += "\n)"
    
  • Running OpenFE Quickrun

    commands=(
      'openfe quickrun Results/Mol_1_Mol_2_complex.json -o Results/Mol_1_Mol_2_complex_RES.json -d Results/Mol_1_Mol_2_complex'
      'openfe quickrun Results/Mol_1_Mol_2_solvent.json -o Results/Mol_1_Mol_2_solvent_RES.json -d Results/Mol_1_Mol_2_solvent'
      'openfe quickrun Results/Mol_3_Mol_6_complex.json -o Results/Mol_3_Mol_6_complex_RES.json -d Results/Mol_3_Mol_6_complex'
      'openfe quickrun Results/Mol_3_Mol_6_solvent.json -o Results/Mol_3_Mol_6_solvent_RES.json -d Results/Mol_3_Mol_6_solvent'
      'openfe quickrun Results/Mol_5_Mol_7_complex.json -o Results/Mol_5_Mol_7_complex_RES.json -d Results/Mol_5_Mol_7_complex'
      'openfe quickrun Results/Mol_5_Mol_7_solvent.json -o Results/Mol_5_Mol_7_solvent_RES.json -d Results/Mol_5_Mol_7_solvent'
      'openfe quickrun Results/Mol_4_Mol_6_complex.json -o Results/Mol_4_Mol_6_complex_RES.json -d Results/Mol_4_Mol_6_complex'
      'openfe quickrun Results/Mol_4_Mol_6_solvent.json -o Results/Mol_4_Mol_6_solvent_RES.json -d Results/Mol_4_Mol_6_solvent'
      'openfe quickrun Results/Mol_1_Mol_3_complex.json -o Results/Mol_1_Mol_3_complex_RES.json -d Results/Mol_1_Mol_3_complex'
      'openfe quickrun Results/Mol_1_Mol_3_solvent.json -o Results/Mol_1_Mol_3_solvent_RES.json -d Results/Mol_1_Mol_3_solvent'
      'openfe quickrun Results/Mol_0_Mol_5_complex.json -o Results/Mol_0_Mol_5_complex_RES.json -d Results/Mol_0_Mol_5_complex'
      'openfe quickrun Results/Mol_0_Mol_5_solvent.json -o Results/Mol_0_Mol_5_solvent_RES.json -d Results/Mol_0_Mol_5_solvent'
      'openfe quickrun Results/Mol_0_Mol_6_complex.json -o Results/Mol_0_Mol_6_complex_RES.json -d Results/Mol_0_Mol_6_complex'
      'openfe quickrun Results/Mol_0_Mol_6_solvent.json -o Results/Mol_0_Mol_6_solvent_RES.json -d Results/Mol_0_Mol_6_solvent'
      'openfe quickrun Results/Mol_4_Mol_5_complex.json -o Results/Mol_4_Mol_5_complex_RES.json -d Results/Mol_4_Mol_5_complex'
      'openfe quickrun Results/Mol_4_Mol_5_solvent.json -o Results/Mol_4_Mol_5_solvent_RES.json -d Results/Mol_4_Mol_5_solvent'
      'openfe quickrun Results/Mol_2_Mol_3_complex.json -o Results/Mol_2_Mol_3_complex_RES.json -d Results/Mol_2_Mol_3_complex'
      'openfe quickrun Results/Mol_2_Mol_3_solvent.json -o Results/Mol_2_Mol_3_solvent_RES.json -d Results/Mol_2_Mol_3_solvent'
      'openfe quickrun Results/Mol_3_Mol_5_complex.json -o Results/Mol_3_Mol_5_complex_RES.json -d Results/Mol_3_Mol_5_complex'
      'openfe quickrun Results/Mol_3_Mol_5_solvent.json -o Results/Mol_3_Mol_5_solvent_RES.json -d Results/Mol_3_Mol_5_solvent'
    )
    
  • Got the following error:

    Loading file...
    Planning simulations for this edge...
    Starting the simulations for this edge...
    SYSTEM CONFIG DETAILS:
    INFO:openfe.utils.system_probe.log:SYSTEM CONFIG DETAILS:
    hostname: 'nid005122'
    INFO:openfe.utils.system_probe.log.hostname:hostname: 'nid005122'
    CUDA-based GPU not found
    INFO:openfe.utils.system_probe.log.gpu:CUDA-based GPU not found
    Memory used: 26.0G (7.7%)
    INFO:openfe.utils.system_probe.log:Memory used: 26.0G (7.7%)
    Results/Mol_1_Mol_2_solvent/scratch_RelativeHybridTopologyProtocolUnit-ffc7cefe2c434b1e866ff7487c6bd0c7_attempt_0: 0% full (49.7T free)
    INFO:openfe.utils.system_probe.log:Results/Mol_1_Mol_2_solvent/scratch_RelativeHybridTopologyProtocolUnit-ffc7cefe2c434b1e866ff7487c6bd0c7_attempt_0: 0% full (49.7T free)
    Preparing the hybrid topology simulation
    INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Preparing the hybrid topology simulation
    Parameterizing molecules
    INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Parameterizing molecules
    WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/topologyhelpers.py:705: UserWarning: mapping 47 : 3258 deviates by more than 1.0
      warnings.warn(wmsg)
    
    WARNING:root:mapping 47 : 3258 deviates by more than 1.0
    WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/topologyhelpers.py:705: UserWarning: mapping 54 : 3266 deviates by more than 1.0
      warnings.warn(wmsg)
    
    WARNING:root:mapping 54 : 3266 deviates by more than 1.0
    Creating hybrid system
    INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Creating hybrid system
    Setting force field terms
    INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Setting force field terms
    Adding forces
    INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Adding forces
    Hybrid system created
    INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Hybrid system created
    WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/mdtraj/core/topology.py:84: UserWarning: atom_indices are not monotonically increasing
      warnings.warn("atom_indices are not monotonically increasing")
    
    WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/compute.py:56: UserWarning: Non-GPU platform selected: HIP, this may significantly impact simulation performance
      warnings.warn(wmsg)
    
    WARNING:root:Non-GPU platform selected: HIP, this may significantly impact simulation performance
    Creating and setting up the sampler
    INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Creating and setting up the sampler
    :0:/home/conda/feedstock_root/build_artifacts/hip_1718643748184/work/clr/hipamd/src/hiprtc/hiprtcInternal.hpp:105 : 672298286298 us: [pid:76800 tid:0x14eff92b4740] Unable to add internal header
    /scratch/project_465000973/Singularity_Envs/openfe_rocm/bin/openfe: line 29: 76278 Aborted                 /usr/bin/singularity --silent exec $DIR/../$CONTAINER_IMAGE bash -c "eval \"\$(/LUMI_TYKKY_1tOKsNy/miniconda/bin/conda shell.bash hook )\"  && conda activate env1 &>/dev/null &&  exec -a $_O_SOURCE $DIR/openfe $( test $# -eq 0 || printf " %q" "$@" )"
    

HiteSit avatar Oct 08 '24 10:10 HiteSit

UPDATE

Like this It works

name: openfe
channels:
  - conda-forge
  - defaults
dependencies:
  - jaimergp/label/unsupported-cudatoolkit-shim::cudatoolkit=11.2.2
  - streamhpc::openmm-hip=8.0.0
  - conda-forge::openfe=1.1.0
  - python=3.11
  - pandas
  - numpy
  - seaborn
  - pip
  - pip:
      - git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platform

image

HiteSit avatar Oct 08 '24 15:10 HiteSit

WARNING:root:Non-GPU platform selected: HIP, this may significantly impact simulation performance We will need to fix this warning since HIP is a GPU platform

I am guessing the difference is here:

  - streamhpc::openmm-hip=8.0.0

and

  - conda-forge/label/openmm_rc::openmm-hip=8.2.0beta

for why it works or doesn't work

There is an issue here on openmm:

https://github.com/openmm/openmm/issues/4675

where people are discussing some problems with the openmm-hip package from conda-forge

mikemhenry avatar Oct 08 '24 17:10 mikemhenry