openfe
openfe copied to clipboard
Request: AMD HIP Platform Support
Hi all,
I was wondering if is possible to add the support for the AMD HIP platform. Specifically OpenMM already is able to work on HIP platforms installing a specific hack with Conda:
mamba install jaimergp/label/unsupported-cudatoolkit-shim::cudatoolkit=11.2.2 && mamba install streamhpc::openmm-hip=8.0.0
But the problem is that in your code, specifically under openmmutils/utils.py there is an assert that allows only ["OpenCL", "CUDA"], probably the fix is easy since as said OpenMM is able to see HIP as platform as well as give it the "right" speed.
The fix would have a huge impact since the LUMI HPC (most powerful in europe) only supports HIP platform.
@HiteSit Thank you for raising this issue!
"We" (not the OpenFE team but other orgs I am apart of) are working to get ROCm/HIP onto conda-forge so that no hacks will be needed to install openmm.
Can you link the line of code that has this assert? Also do you know how OpenMM reports the HIP platform string-wise? I would be happy to get this working.
So,
the utils file is under:
/mambaforge/envs/cheminf_3_11/lib/python3.11/site-packages/openmmtools/utils/utils.py
def platform_supports_precision(platform, precision):
"""Determine whether the specified OpenMM Platform supports the specified minimum precision.
Parameters
----------
platform : str or openmm.Platform
The platform or platform name to check
precision : str
One of ['single', 'mixed', 'double']
Returns
-------
is_supported : bool
True if the platform supports the specified precision; False otherwise
"""
SUPPORTED_PRECISIONS = ['single', 'mixed', 'double']
assert precision in SUPPORTED_PRECISIONS, f"Precision {precision} must be one of {SUPPORTED_PRECISIONS}"
if isinstance(platform, str):
# Get the actual Platform object if the platform_name was specified
platform = openmm.Platform.getPlatformByName(platform)
if platform.getName() == 'Reference':
# Reference is double precision
return (precision == 'double')
if platform.getName() == 'CPU':
return precision in ['mixed']
if platform.getName() in ['CUDA', 'OpenCL']:
properties = { 'Precision' : precision }
system = openmm.System()
system.addParticle(1.0) # Cannot create Context on a system with no particles
integrator = openmm.VerletIntegrator(0.001)
try:
context = openmm.Context(system, integrator, platform, properties)
del context, integrator
return True
except Exception as e:
return False
raise Exception(f"Platform {platform.getName()} unknown")
def get_available_platforms(minimum_precision='mixed'):
"""Return a list of the available OpenMM Platforms that can satisfy the requested minimum precision.
Parameters
----------
minimum_precision : str, optional, default='mixed'
One of [None, 'single', 'mixed', 'double']
If None, all available platforms will be returned.
Returns
-------
platforms : list of openmm.Platform
Platforms that support specified minimumprecision
"""
platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())]
if minimum_precision is not None:
# Filter based on precision support
platforms = [ platform for platform in platforms if platform_supports_precision(platform, minimum_precision) ]
return platforms
def get_fastest_platform(minimum_precision='mixed'):
"""Return the fastest available platform.
This relies on the hardcoded speed values in Platform.getSpeed().
Parameters
----------
minimum_precision : str, optional, default='mixed'
One of ['single', 'mixed', 'double']
Returns
-------
platform : openmm.Platform
The fastest available platform.
"""
platforms = get_available_platforms(minimum_precision=minimum_precision)
fastest_platform = max(platforms, key=lambda x: x.getSpeed())
return fastest_platform
If I run:
platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())]
for platform in platforms:
name = platform.getName()
print(name)
>>> Something like "CPU", "OpenCL", "HIP"
Wrong timing, this week LUMI HPC is down for maintenance, I will edit this message with the output from the print. But I'm quite sure that the only problem is that assert. I would like to edit the code by myself and report if just adding to the assert also ["HIP"] would work, but LUMI install Conda (OpenFF, OpenMM, OpenFreeEnergy and so on) using Singularity enviroments (read-only) and I could not figure it out yet how to use it with --sandbox, If I figure it out I will edit this post.
Support for the HIP platform would be nice, however I would warn that it definitely needs validation prior to use. Untested platforms tend to be prone to odd behaviour in the alchemical world. Validation would require at least an HFE validation test & a couple of RBFE test cases.
Is this something you'd be willing to take on @HiteSit ?
@IAlibay Surely I would like to contribute. First I have to resolve the problem with the Singularity Enviroment. Beside that I can code quite well but I do not have a enough experience with alchemical transformation so I need to be guided.
@HiteSit - we need to discuss this internally first, but the requirement here would be mostly to run a suitably large set of alchemical simulations to verify that the results are reasonable. This would mostly require access to suitable AMD HIP compute resources to do such a validation, which unfortunately we do not have :(
@IAlibay Yes I understood you need access to the platform. I will try to grab some computational time (node/hours) for free, but anyway if you do not need an astonishing amount of computational time (maybe try to give me more or less a range of node/hours) I'm willing to share my computational time without problem. It's my pleasure to contribute.
You can contact me on [email protected]
@HiteSit - I'm re-opening this issue if it's ok, there definitely needs to be some kind of update to our compute platform selection to allow for HIP. My question was more of a "once this is one, someone will need to check it works".
@IAlibay I can check, the only constraint is the computational time, I have a limited amount of computational time, but as rule of thumb if the testing is around let's say 10 proteins each of them with 30 ligands should not be a problem. If it's more I can figure it out a way to grab more computational time.
Thanks for pointing to the code file! I will raise this as a separate issue on the openmmtools side of things since there isn't really any reason why we couldn't add support for HIP there, but as @IAlibay said, when it comes to using it in an openfe workflow, we will need it validated.
@HiteSit I've made a branch on openmmtools with the changes that I think are needed. To play around with it, to test it you will need to run:
(you can also use mamba or conda to do this)
# Create an env that has openmm with the hip platform
$ micromamba create -n openmm82beta-openfe -c conda-forge/label/openmm_rc -c conda-forge "openmm-hip==8.2.0beta" "openfe==1.1.0"
# Activate env
$ micromamba activate openmm82beta-openfe
# Install openmmtools branch
$ pip install git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platform
Can you let me know if that works for you? I don't have an AMD card handy but I can spin up some cloud resources to test if needed.
@mikemhenry Sure, give me a couple of days. I will edit this message with the answer.
Sounds good, my only request is that you make a new message with the answer :) GitHub won't notify me if you edit your message but will if you post a new one, thanks!
@mikemhenry
-
I installed the package as follow
name: openfe channels: - conda-forge - defaults dependencies: - conda-forge/label/openmm_rc::openmm-hip=8.2.0beta - conda-forge::openfe=1.1.0 - python=3.10 - pandas - numpy - seaborn - pip - pip: - git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platformconda-containerize new --prefix <install_dir> env.yml -
Checking if OpenMM was able to recognise the HIP platform
platforms = [openmm.Platform.getPlatform(i) for i in range(openmm.Platform.getNumPlatforms())] for platform in platforms: print(platform.getName()) > Reference > CPU > HIP > OpenCL -
The sample run failed causing the death of the kernel
# Create a system for testing system = mm.System() # Adding a dummy particle to avoid an empty system error system.addParticle(1.0 * unit.dalton) # Select the OpenCL platform platform = mm.Platform.getPlatformByName('HIP') # Create an integrator integrator = mm.LangevinIntegrator(300*unit.kelvin, 1/unit.picosecond, 0.002*unit.picoseconds) # Create a context to check properties context = mm.Context(system, integrator, platform) # Print platform properties properties = context.getPlatform().getPropertyNames() for prop in properties: value = context.getPlatform().getPropertyValue(context, prop) print(f'{prop}: {value}') > Kernel died -
Setting up OpenFE quickrun
def build_kar(ligand_mols, mode): # Create an MST network mst_network = generate_minimal_spanning_network( ligands=ligand_mols, scorer=openfe.lomap_scorers.default_lomap_score, mappers=[KartografAtomMapper(),]) mst_edges = [edge for edge in mst_network.edges] # Create a lomap network mappers = [ openfe.setup.LomapAtomMapper( time=20, threed=True, max3d=1.0, element_change=True, seed='[#7]-[#6](=O)-[#6]-[#7]-1-[#6]-[#6]-[#7](-[#6]-[#6]-1)-[#6]=O', shift=True, ), ] lomap_network = generate_lomap_network( molecules=ligand_mols, scorer=openfe.lomap_scorers.default_lomap_score, mappers=mappers) lomap_edges = [edge for edge in lomap_network.edges] if mode == "mst": return mst_network, mst_edges elif mode == "lomap": return lomap_network, lomap_edges def define_transformation(pdbfile, edge): protein = ProteinComponent.from_pdb_file(pdbfile) solvent = SolventComponent(positive_ion='Na', negative_ion='Cl', neutralize=True, ion_concentration=0.15*unit.molar) one_complex = ChemicalSystem({"ligand": edge.componentA, "solvent": solvent, "protein": protein}, name=edge.componentA.name) one_solvent = ChemicalSystem({"ligand": edge.componentA, "solvent": solvent}, name=edge.componentA.name) two_complex = ChemicalSystem({"ligand": edge.componentB, "solvent": solvent, "protein": protein}, name=edge.componentB.name) two_solvent = ChemicalSystem({"ligand": edge.componentB, "solvent": solvent}, name=edge.componentB.name) rbfe_settings = RelativeHybridTopologyProtocol.default_settings() rbfe_settings.simulation_settings.equilibration_length = 10 * unit.picosecond rbfe_settings.simulation_settings.production_length = 50 * unit.picosecond rbfe_settings.engine_settings.compute_platform = "HIP" rbfe_protocol = RelativeHybridTopologyProtocol( settings=rbfe_settings ) transformation_complex = openfe.Transformation( stateA=one_complex, stateB=two_complex, mapping=edge, protocol=rbfe_protocol, name=f"{one_complex.name}_{two_complex.name}_complex" ) transformation_solvent = openfe.Transformation( stateA=one_solvent, stateB=two_solvent, mapping=edge, protocol=rbfe_protocol, name=f"{one_solvent.name}_{two_solvent.name}_solvent" ) trans_list = [transformation_complex, transformation_solvent] return trans_list def set_trans(pdbfile, mst_edge, results_dir): trans_lst = define_transformation(pdbfile, mst_edge) RUNS = [] for trans in trans_lst: # Basename run_basename = trans.name # Set the Run Dirs run_dir = os.path.join(results_dir, run_basename) os.makedirs(run_dir, exist_ok=True) # Set the input json_path i_json_file = os.path.join(results_dir, f"{run_basename}.json") trans.dump(i_json_file) # Set the output json_path o_json_file = os.path.join(results_dir, f"{run_basename}_RES.json") # Create the command string RUN = f"openfe quickrun {i_json_file} -o {o_json_file} -d {run_dir}" RUNS.append(RUN) return RUNS ligands_sdf = Chem.SDMolSupplier('Aligned_Rdkit_FIX.sdf', removeHs=False) pdbfile = "./LAC3.pdb" # Now pass these to form a list of Molecules ligand_mols = [SmallMoleculeComponent(sdf) for sdf in ligands_sdf] network, edges = build_kar(ligand_mols, mode="lomap") all_runs = [] for edge in edges: trans = set_trans(pdbfile, edge, results_dir) all_runs.extend(trans) bash_commands = "# Array of commands to execute\ncommands=(\n " bash_commands += "\n ".join(f"'{cmd}'" for cmd in all_runs) bash_commands += "\n)" -
Running OpenFE Quickrun
commands=( 'openfe quickrun Results/Mol_1_Mol_2_complex.json -o Results/Mol_1_Mol_2_complex_RES.json -d Results/Mol_1_Mol_2_complex' 'openfe quickrun Results/Mol_1_Mol_2_solvent.json -o Results/Mol_1_Mol_2_solvent_RES.json -d Results/Mol_1_Mol_2_solvent' 'openfe quickrun Results/Mol_3_Mol_6_complex.json -o Results/Mol_3_Mol_6_complex_RES.json -d Results/Mol_3_Mol_6_complex' 'openfe quickrun Results/Mol_3_Mol_6_solvent.json -o Results/Mol_3_Mol_6_solvent_RES.json -d Results/Mol_3_Mol_6_solvent' 'openfe quickrun Results/Mol_5_Mol_7_complex.json -o Results/Mol_5_Mol_7_complex_RES.json -d Results/Mol_5_Mol_7_complex' 'openfe quickrun Results/Mol_5_Mol_7_solvent.json -o Results/Mol_5_Mol_7_solvent_RES.json -d Results/Mol_5_Mol_7_solvent' 'openfe quickrun Results/Mol_4_Mol_6_complex.json -o Results/Mol_4_Mol_6_complex_RES.json -d Results/Mol_4_Mol_6_complex' 'openfe quickrun Results/Mol_4_Mol_6_solvent.json -o Results/Mol_4_Mol_6_solvent_RES.json -d Results/Mol_4_Mol_6_solvent' 'openfe quickrun Results/Mol_1_Mol_3_complex.json -o Results/Mol_1_Mol_3_complex_RES.json -d Results/Mol_1_Mol_3_complex' 'openfe quickrun Results/Mol_1_Mol_3_solvent.json -o Results/Mol_1_Mol_3_solvent_RES.json -d Results/Mol_1_Mol_3_solvent' 'openfe quickrun Results/Mol_0_Mol_5_complex.json -o Results/Mol_0_Mol_5_complex_RES.json -d Results/Mol_0_Mol_5_complex' 'openfe quickrun Results/Mol_0_Mol_5_solvent.json -o Results/Mol_0_Mol_5_solvent_RES.json -d Results/Mol_0_Mol_5_solvent' 'openfe quickrun Results/Mol_0_Mol_6_complex.json -o Results/Mol_0_Mol_6_complex_RES.json -d Results/Mol_0_Mol_6_complex' 'openfe quickrun Results/Mol_0_Mol_6_solvent.json -o Results/Mol_0_Mol_6_solvent_RES.json -d Results/Mol_0_Mol_6_solvent' 'openfe quickrun Results/Mol_4_Mol_5_complex.json -o Results/Mol_4_Mol_5_complex_RES.json -d Results/Mol_4_Mol_5_complex' 'openfe quickrun Results/Mol_4_Mol_5_solvent.json -o Results/Mol_4_Mol_5_solvent_RES.json -d Results/Mol_4_Mol_5_solvent' 'openfe quickrun Results/Mol_2_Mol_3_complex.json -o Results/Mol_2_Mol_3_complex_RES.json -d Results/Mol_2_Mol_3_complex' 'openfe quickrun Results/Mol_2_Mol_3_solvent.json -o Results/Mol_2_Mol_3_solvent_RES.json -d Results/Mol_2_Mol_3_solvent' 'openfe quickrun Results/Mol_3_Mol_5_complex.json -o Results/Mol_3_Mol_5_complex_RES.json -d Results/Mol_3_Mol_5_complex' 'openfe quickrun Results/Mol_3_Mol_5_solvent.json -o Results/Mol_3_Mol_5_solvent_RES.json -d Results/Mol_3_Mol_5_solvent' ) -
Got the following error:
Loading file... Planning simulations for this edge... Starting the simulations for this edge... SYSTEM CONFIG DETAILS: INFO:openfe.utils.system_probe.log:SYSTEM CONFIG DETAILS: hostname: 'nid005122' INFO:openfe.utils.system_probe.log.hostname:hostname: 'nid005122' CUDA-based GPU not found INFO:openfe.utils.system_probe.log.gpu:CUDA-based GPU not found Memory used: 26.0G (7.7%) INFO:openfe.utils.system_probe.log:Memory used: 26.0G (7.7%) Results/Mol_1_Mol_2_solvent/scratch_RelativeHybridTopologyProtocolUnit-ffc7cefe2c434b1e866ff7487c6bd0c7_attempt_0: 0% full (49.7T free) INFO:openfe.utils.system_probe.log:Results/Mol_1_Mol_2_solvent/scratch_RelativeHybridTopologyProtocolUnit-ffc7cefe2c434b1e866ff7487c6bd0c7_attempt_0: 0% full (49.7T free) Preparing the hybrid topology simulation INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Preparing the hybrid topology simulation Parameterizing molecules INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Parameterizing molecules WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/topologyhelpers.py:705: UserWarning: mapping 47 : 3258 deviates by more than 1.0 warnings.warn(wmsg) WARNING:root:mapping 47 : 3258 deviates by more than 1.0 WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/topologyhelpers.py:705: UserWarning: mapping 54 : 3266 deviates by more than 1.0 warnings.warn(wmsg) WARNING:root:mapping 54 : 3266 deviates by more than 1.0 Creating hybrid system INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Creating hybrid system Setting force field terms INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Setting force field terms Adding forces INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Adding forces Hybrid system created INFO:openfe.protocols.openmm_rfe._rfe_utils.relative:Hybrid system created WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/mdtraj/core/topology.py:84: UserWarning: atom_indices are not monotonically increasing warnings.warn("atom_indices are not monotonically increasing") WARNING:py.warnings:/LUMI_TYKKY_1tOKsNy/miniconda/envs/env1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/_rfe_utils/compute.py:56: UserWarning: Non-GPU platform selected: HIP, this may significantly impact simulation performance warnings.warn(wmsg) WARNING:root:Non-GPU platform selected: HIP, this may significantly impact simulation performance Creating and setting up the sampler INFO:gufekey.openfe.protocols.openmm_rfe.equil_rfe_methods.RelativeHybridTopologyProtocolUnit:Creating and setting up the sampler :0:/home/conda/feedstock_root/build_artifacts/hip_1718643748184/work/clr/hipamd/src/hiprtc/hiprtcInternal.hpp:105 : 672298286298 us: [pid:76800 tid:0x14eff92b4740] Unable to add internal header /scratch/project_465000973/Singularity_Envs/openfe_rocm/bin/openfe: line 29: 76278 Aborted /usr/bin/singularity --silent exec $DIR/../$CONTAINER_IMAGE bash -c "eval \"\$(/LUMI_TYKKY_1tOKsNy/miniconda/bin/conda shell.bash hook )\" && conda activate env1 &>/dev/null && exec -a $_O_SOURCE $DIR/openfe $( test $# -eq 0 || printf " %q" "$@" )"
UPDATE
Like this It works
name: openfe
channels:
- conda-forge
- defaults
dependencies:
- jaimergp/label/unsupported-cudatoolkit-shim::cudatoolkit=11.2.2
- streamhpc::openmm-hip=8.0.0
- conda-forge::openfe=1.1.0
- python=3.11
- pandas
- numpy
- seaborn
- pip
- pip:
- git+https://github.com/choderalab/openmmtools.git@feat/add-hip-platform
WARNING:root:Non-GPU platform selected: HIP, this may significantly impact simulation performance We will need to fix this warning since HIP is a GPU platform
I am guessing the difference is here:
- streamhpc::openmm-hip=8.0.0
and
- conda-forge/label/openmm_rc::openmm-hip=8.2.0beta
for why it works or doesn't work
There is an issue here on openmm:
https://github.com/openmm/openmm/issues/4675
where people are discussing some problems with the openmm-hip package from conda-forge