alchemtest: the simple alchemistry test set¶
alchemtest is a collection of test datasets for alchemical free energy calculations. The datasets come from a variety of software packages, primarily molecular dynamics engines, and are used as the test set for alchemlyb. The package is standalone, however, and can be used for any purpose.
Datasets are released under an open license that conforms to the Open Definition 2.1 that allows free use, re-use, redistribution, modification, separation, for any purpose and without a charge. All data and code can be found in the public GitHub repository alchemistry/alchemtest.
This library is under active development. We use semantic versioning to indicate clearly what kind of changes you may expect between releases. Although it is heavily used for the alchemlyb test suite it may contain bugs. Please raise any issues or questions in the Issue Tracker. Contributions of data sets and code in the form of pull requests are very welcome.
Installing alchemtest¶
- alchemtest is pure-Python, so it can be installed easily via
pip
:pip install alchemtest
If you wish to install this in your user site-packages
, use the
--user
flag:
pip install --user alchemtest
Installing from source¶
from source. Clone the source from GitHub with:
git clone https://github.com/alchemistry/alchemtest.git
then do:
cd alchemtest
pip install .
If you wish to install this in your user site-packages
, use the
--user
flag:
pip install --user .
Basic usage¶
All datasets in alchemtest
are accessible via load_*
functions, organized in submodules by the software package that
generated them. The current set of submodules are:
Gromacs molecular dynamics simulation datasets. |
|
Amber molecular dynamics simulation datasets. |
|
NAMD molecular dynamics simulation datasets. |
As an example, we can access the Gromacs: Benzene in water dataset with:
>>> from alchemtest.gmx import load_benzene
>>> bz = load_benzene()
and use the resulting Bunch
object to introspect what this
dataset includes. In particular, it features a DESCR
attribute
with a human-readable description of the dataset:
>>> print(bz.DESCR)
Gromacs: Benzene in water
=========================
Benzene in water, alchemically turned into benzene in vacuum separated from water
Notes
-----
Data Set Characteristics:
:Number of Legs: 2 (Coulomb, VDW)
:Number of Windows: 5 for Coulomb, 16 for VDW
:Length of Windows: 40ns
:Missing Values: None
:Creator: \I. Kenney
:Donor: Ian Kenney (ian.kenney@asu.edu)
:Date: March 2017
:License: `CC0
<https://creativecommons.org/publicdomain/zero/1.0/>`_
Public Domain Dedication
This dataset was generated using `MDPOW <https://github.com/Becksteinlab/MDPOW>`_, with
the `Gromacs <http://www.gromacs.org/>`_ molecular dynamics engine.
as well as the dataset itself:
>>> bz.data.keys()
['VDW', 'Coulomb']
which consists in this case of two alchemical legs, each having several files. For this dataset each file happens to correspond to a simulation sampling a particular \(\lambda\):
>>> bz.data['Coulomb']
['/usr/local/python3.6/site-packages/alchemtest/gmx/benzene/Coulomb/0000/dhdl.xvg.bz2',
'/usr/local/python3.6/site-packages/alchemtest/gmx/benzene/Coulomb/0250/dhdl.xvg.bz2',
'/usr/local/python3.6/site-packages/alchemtest/gmx/benzene/Coulomb/0500/dhdl.xvg.bz2',
'/usr/local/python3.6/site-packages/alchemtest/gmx/benzene/Coulomb/0750/dhdl.xvg.bz2',
'/usr/local/python3.6/site-packages/alchemtest/gmx/benzene/Coulomb/1000/dhdl.xvg.bz2']
These paths can be read by any appropriate parser for further
analysis. For this particular dataset, see
alchemlyb.parsing.gmx
for a good set of parsers.
Helper functions and classes¶
A small number of functions and classes are included to help organize the data.
-
class
alchemtest.
Bunch
(**kwargs)¶ Container object for datasets
Dictionary-like object that exposes its keys as attributes.
>>> b = Bunch(a=1, b=2) >>> b['b'] 2 >>> b.b 2 >>> b.a = 3 >>> b['a'] 3 >>> b.c = 6 >>> b['c'] 6
Code taken from sklearn/utils/__init__.py version 0.19.1 under the ‘New BSD license’ https://github.com/scikit-learn/scikit-learn/blob/master/COPYING
Gromacs datasets¶
Gromacs molecular dynamics simulation datasets.
The alchemlyb.gmx
module features datasets generated using the
Gromacs molecular dynamics engine. They
can be accessed using the following accessor functions:
Load the Gromacs benzene dataset. |
|
Load the Gromacs Host CB7 Guest C3 expanded ensemble dataset, case 1 (single simulation visits all states). |
|
Load the Gromacs Host CB7 Guest C3 expanded ensemble dataset, case 2 (two simulations visit all states independently). |
|
Load the Gromacs Host CB7 Guest C3 REX dataset, case 3. |
|
Load the Gromacs water particle with total energy dataset. |
|
Load the Gromacs water particle with potential energy dataset. |
|
Load the Gromacs water particle without energy dataset. |
Simple TI and FEP¶
The data sets contain derivatives of the Hamiltonian (TI) and free energy perturbation (FEP) data suitable for processing with FEP estimators as well as BAR/MBAR. Individual \(\lambda\) windows were run independently.
Gromacs: Benzene in water¶
Benzene in water, alchemically turned into benzene in vacuum separated from water
Notes¶
- Data Set Characteristics:
- Number of Legs
2 (Coulomb, VDW)
- Number of Windows
5 for Coulomb, 16 for VDW
- Length of Windows
40ns
- System Size
1668 atoms
- Temperature
300 K
- Pressure
1 bar
- Alchemical Pathway
vdw + coul –> vdw –> vacuum
- Experimental Hydration Free Energy
-0.90 +- 0.2 kcal/mol
- Missing Values
None
- Energy unit
kJ/mol
- Time unit
ps
- Creator
I. Kenney
- Donor
Ian Kenney (ian.kenney@asu.edu)
- Date
March 2017
- License
CC0 Public Domain Dedication
This dataset was generated using MDPOW, with the Gromacs molecular dynamics engine.
Experimental value sourced from [Mobley2013].
- Mobley2013
Mobley, David L. (2013). Experimental and Calculated Small Molecule Hydration Free Energies. UC Irvine: Department of Pharmaceutical Sciences, UCI. Retrieved from: http://escholarship.org/uc/item/6sd403pz
Extended ensemble¶
Data for extended ensemble simulations; case 1 and case 2 are extended ensembles in the alchemical parameters, case 3 includes replica exchange (REX).
Gromacs: Host CB7 and Guest C3 in water¶
Host CB7 and Guest C3 in water, Guest C3 alchemically turned into Guest C3 in vacuum separated from water and Host CB7. This unpublished data uses Host CB7 and Guest C3 from [Muddana2014a]. Similar published data can be found in [Monroe2014a].
Notes¶
- Data Set Characteristics:
- Number of Legs
2 (Coulomb, VDW)
- Number of Windows
32 total, 20 for Coulomb, 12 for VDW
- Number of Simulations
1
- Length of Simulation
100ns
- System Size
8286 atoms
- Temperature
300 K
- Alchemical Pathway
vdw + coul –> vdw –> vacuum
- Missing Values
None
- Energy unit
kJ/mol
- Time unit
ps
- Creator
T. Jensen
- Donor
Travis Jensen (travis.jensen@colorado.edu)
- Date
November 2017
- License
CC0 Public Domain Dedication
This dataset was generated using the expanded ensemble algorithm in the Gromacs molecular dynamics engine.
- Muddana2014a
Muddana, A. Fenley, D. Mobley, and M. Gilson. The SAMPL4 host–guest blind prediction challenge: an overview. Journal of Computer-Aided Molecular Design, 28(4):305–317, 2014. PMID: 24599514. DOI: 10.1007/s10822-014-9735-1.
- Monroe2014a
Monroe and M. Shirts. Converging free energies of binding in cucurbit[7]uril and octa-acid host-guest systems from SAMPL4 using expanded ensemble simulations. Journal of Computer-Aided Molecular Design, 28(4):401–415, 2014. PMID: 24610238 DOI: 10.1007/s10822-014-9716-4.
-
alchemtest.gmx.
load_expanded_ensemble_case_1
()¶ Load the Gromacs Host CB7 Guest C3 expanded ensemble dataset, case 1 (single simulation visits all states).
- Returns
data – Dictionary-like object, the interesting attributes are:
’data’ : the data files by alchemical leg
’DESCR’: the full description of the dataset
- Return type
Gromacs: Host CB7 and Guest C3 in water¶
Host CB7 and Guest C3 in water, Guest C3 alchemically turned into Guest C3 in vacuum separated from water and Host CB7. This unpublished data uses Host CB7 and Guest C3 from [Muddana2014b]. Similar published data can be found in [Monroe2014b].
Notes¶
- Data Set Characteristics:
- Number of Legs
2 (Coulomb, VDW)
- Number of Windows
32 total, 20 for Coulomb, 12 for VDW
- Number of Simulations
2
- Length of Simulation
50ns
- System Size
8286 atoms
- Temperature
300 K
- Alchemical Pathway
vdw + coul –> vdw –> vacuum
- Missing Values
None
- Energy unit
kJ/mol
- Time unit
ps
- Creator
T. Jensen
- Donor
Travis Jensen (travis.jensen@colorado.edu)
- Date
November 2017
- License
CC0 Public Domain Dedication
This dataset was generated using the expanded ensemble algorithm in the Gromacs molecular dynamics engine.
- Muddana2014b
Muddana, A. Fenley, D. Mobley, and M. Gilson. The SAMPL4 host–guest blind prediction challenge: an overview. Journal of Computer-Aided Molecular Design, 28(4):305–317, 2014. PMID: 24599514. DOI: 10.1007/s10822-014-9735-1.
- Monroe2014b
Monroe and M. Shirts. Converging free energies of binding in cucurbit[7]uril and octa-acid host-guest systems from SAMPL4 using expanded ensemble simulations. Journal of Computer-Aided Molecular Design, 28(4):401–415, 2014. PMID: 24610238 DOI: 10.1007/s10822-014-9716-4.
-
alchemtest.gmx.
load_expanded_ensemble_case_2
()¶ Load the Gromacs Host CB7 Guest C3 expanded ensemble dataset, case 2 (two simulations visit all states independently).
- Returns
data – Dictionary-like object, the interesting attributes are:
’data’ : the data files by alchemical leg
’DESCR’: the full description of the dataset
- Return type
Gromacs: Host CB7 and Guest C3 in water¶
Host CB7 and Guest C3 in water, Guest C3 alchemically turned into Guest C3 in vacuum separated from water and Host CB7. This unpublished data uses Host CB7 and Guest C3 from [Muddana2014c].
Notes¶
- Data Set Characteristics:
- Number of Legs
2 (Coulomb, VDW)
- Number of Windows
32 total, 20 for Coulomb, 12 for VDW
- Number of Simulations
32
- Length of Simulation
5ns
- System Size
8286 atoms
- Temperature
300 K
- Alchemical Pathway
vdw + coul –> vdw –> vacuum
- Missing Values
None
- Energy unit
kJ/mol
- Time unit
ps
- Creator
T. Jensen
- Donor
Travis Jensen (travis.jensen@colorado.edu)
- Date
November 2017
- License
CC0 Public Domain Dedication
This dataset was generated using the REX algorithm in the Gromacs molecular dynamics engine.
- Muddana2014c
Muddana, A. Fenley, D. Mobley, and M. Gilson. The SAMPL4 host–guest blind prediction challenge: an overview. Journal of Computer-Aided Molecular Design, 28(4):305–317, 2014. PMID: 24599514. DOI: 10.1007/s10822-014-9735-1.
Water particle TI and FEP¶
3 simple dH/dl and U_nk datasets of a single water particle from a simulations of water between to hydrophilic surfaces. One dataset contains a total energy column, one contains a potential energy column and one does not contain a energy column.
Gromacs: water particle¶
Free energy estimation of a water particle between to hydrophilic surfaces
Notes¶
- Data Set Characteristics:
- Number of Legs
2 (Coulomb, VDW)
- Number of Windows
17 for Coulomb, 20 for VDW
- Length of Windows
10ns
- System Size
3312 atoms
- Temperature
300 K
- Ensemble
NVT
- Volume
70.204 nm^3
- Alchemical Pathway
vacuum –> vdw –> vdw + coul
- Missing Values
None
- Creator
D. Wille
- Donor
Dominik Wille (harlor@web.de)
- Date
November 2018
- License
CC0 Public Domain Dedication
Similar free energy estimations can be found in:
- Schlaich2017
Alexander Schlaich, Julian Kappler, and Roland R. Netz. Hydration Friction in Nanoconfinement: From Bulk via Interfacial to Dry Friction. Nano Lett., 2017, 17 (10), pp 5969–5976. DOI: 10.1021/acs.nanolett.7b02000.
-
alchemtest.gmx.
load_water_particle_with_total_energy
()¶ Load the Gromacs water particle with total energy dataset.
- Returns
data – Dictionary-like object, the interesting attributes are:
’data’ : the data files by alchemical leg
’DESCR’: the full description of the dataset
- Return type
-
alchemtest.gmx.
load_water_particle_with_potential_energy
()¶ Load the Gromacs water particle with potential energy dataset.
- Returns
data – Dictionary-like object, the interesting attributes are:
’data’ : the data files by alchemical leg
’DESCR’: the full description of the dataset
- Return type
Amber datasets¶
Amber molecular dynamics simulation datasets.
The alchemlyb.amber
module features datasets generated using
the Amber molecular dynamics engine.
They can be accessed using the following accessor functions:
Load Amber Bace improper solvated vdw example :returns: data – Dictionary-like object, the interesting attributes are: |
|
Load Amber Bace example perturbation. |
|
Load the Amber solvated dataset. |
|
Load the invalid files. |
Amber: Small molecule thermodynamic integration free energy difference in water¶
Improper Bace solvated small molecule perturbation, alchemical vdw perturbation of ligand 1 into ligand 2. This example uses ligands CAT-13a to CAT-13m from [Wang2015].
Notes¶
- Data Set Characteristics:
- Number of Legs
1 (vdw)
- Number of Windows
12
- Length of Windows
1ns
- System Size
3920 atoms
- Temperature
300 K
- Pressure
1 bar
- Alchemical Pathway
vdw in ligand 1 –> vdw in ligand 2, softcore is used in vdw
- Experimental Free Energy difference
N/A
- Missing Values
None
- Energy unit
kcal/mol
- Time unit
ps
- Date
Jan 2018
- Donor
Silicon Therapeutics
- License
CC0 Public Domain Dedication
This dataset was generated using the Amber molecular dynamics engine.
- Wang2015
L. Wang, Y. Wu, Y. Deng, B. Kim, L. Pierce, G. Krilov, D. Lupyan, S. Robinson, M. K. Dahlgren, J. Greenwood, D. L. Romero, C. Masse, J. L. Knight, T. Steinbrecher, T. Beuming, W. Damm, E. Harder, W. Sherman, M. Brewer, R. Wester, M. Murcko, L. Frye, R. Farid, T. Lin, D. L. Mobley, W. L. Jorgensen, B. J. Berne, R. A. Friesner, and R. Abel. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. Journal of the American Chemical Society, 137(7):2695–2703, 2015. PMID: 25625324. DOI: 10.1021/ja512751q.
Amber: Small molecule thermodynamic integration free energy difference in water¶
Bace complex and solvated small molecule perturbation, alchemical perturbation of ligand 1 into ligand 2. This example uses ligands CAT-13d to CAT-17a from [Wang2015].
Notes¶
- Data Set Characteristics:
- Number of Legs
3 (decharge, vdw, recharge)
- Number of Windows
5 for decharge, 12 for vdw, 5 for recharge
- Length of Windows
1ns
- System Size
46594 atoms (complex), 4115 atoms (solvated)
- Temperature
300 K
- Pressure
1 bar
- Alchemical Pathway
(decharge + vdw + recharge) in ligand 1 –> (decharge + vdw + recharge) in ligand 2, decharge, vdw, and recharge are running in parellel, soft core is used in vdw
- Experimental Free Energy difference
-0.26 kcal/mol
- Missing Values
None
- Energy unit
kcal/mol
- Time unit
ps
- Date
Jan 2018
- Donor
Silicon Therapeutics
- License
CC0 Public Domain Dedication
This dataset was generated using the Amber molecular dynamics engine.
- Wang2015
L. Wang, Y. Wu, Y. Deng, B. Kim, L. Pierce, G. Krilov, D. Lupyan, S. Robinson, M. K. Dahlgren, J. Greenwood, D. L. Romero, C. Masse, J. L. Knight, T. Steinbrecher, T. Beuming, W. Damm, E. Harder, W. Sherman, M. Brewer, R. Wester, M. Murcko, L. Frye, R. Farid, T. Lin, D. L. Mobley, W. L. Jorgensen, B. J. Berne, R. A. Friesner, and R. Abel. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. Journal of the American Chemical Society, 137(7):2695–2703, 2015. PMID: 25625324. DOI: 10.1021/ja512751q.
Amber: Small molecule thermodynamic integration free energy difference in water¶
Small molecule perturbation in water, alchemically turned ligand 1 into ligand 2 in water. This example uses ligands 17124-1 to 18637-1 from [Wang2015].
Notes¶
- Data Set Characteristics:
- Number of Legs
2 (charge, vdw)
- Number of Windows
5 for charge, 12 for vdw
- Length of Windows
1ns
- System Size
5979 atoms
- Temperature
300 K
- Pressure
1 bar
- Alchemical Pathway
(charge + vdw) in ligand 1 –> (charge + vdw) in ligand 2, charge and vdw are running in parellel, soft core is used in vdw
- Experimental Free Energy difference
N/A
- Missing Values
None
- Energy unit
kcal/mol
- Time unit
ps
- Date
Oct 2017
- Donor
Silicon Therapeutics
- License
CC0 Public Domain Dedication
This dataset was generated using the Amber molecular dynamics engine.
- Wang2015
L. Wang, Y. Wu, Y. Deng, B. Kim, L. Pierce, G. Krilov, D. Lupyan, S. Robinson, M. K. Dahlgren, J. Greenwood, D. L. Romero, C. Masse, J. L. Knight, T. Steinbrecher, T. Beuming, W. Damm, E. Harder, W. Sherman, M. Brewer, R. Wester, M. Murcko, L. Frye, R. Farid, T. Lin, D. L. Mobley, W. L. Jorgensen, B. J. Berne, R. A. Friesner, and R. Abel. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. Journal of the American Chemical Society, 137(7):2695–2703, 2015. PMID: 25625324. DOI: 10.1021/ja512751q.
Amber TI invalid output files¶
Examples for file validation testing.
Notes¶
invalid-case-1.out.bz2: file contains no useful data
invalid-case-2.out.bz2: file contains no control data
invalid-case-3.out.bz2: file with Non-constant temperature
invalid-case-4.out.bz2: file with no free energy section
invalid-case-5.out.bz2: file with no ATOMIC section
invalid-case-6.out.bz2: file with no RESULTS section
NAMD datasets¶
NAMD molecular dynamics simulation datasets.
The alchemlyb.namd
module features datasets generated using the
NAMD molecular dynamics
engine. They can be accessed using the following accessor functions:
Load the NAMD tyrosine to alanine mutation dataset. |
NAMD: free energy of tyrosine to alanine mutation in aqueous solution¶
Free energy change from mutating a tyrosine (Y) residue into alanine (A) in the Ala-Tyr-Ala tripeptide in aqueous environment.
Notes¶
- Data Set Characteristics:
- Number of Legs
2 (forward Y–>A, backward A–>Y)
- Number of Windows
20 for each leg
- Length of Windows
1000 ps (each window interspersed with 200 ps equilibration)
- System Size
1521 atoms
- Temperature
300 K
- Pressure
1 bar
- Alchemical Pathway
Point mutation of Tyr to Ala using dual topology hybrid molecule. Nonbonded interactions of perturbed atoms are scaled with their environment.
- Experimental Free Energy difference
N/A
- Missing Values
None
- Energy unit
kcal/mol
- Time unit
step
- Date
Oct 2017
- Donor
JC Gumbart
- License
CC0 Public Domain Dedication
This dataset was generated using the NAMD molecular dynamics engine.
GOMC datasets¶
GOMC Monte Carlo simulation datasets.
The alchemlyb.gomc
module features datasets generated using the
GPU Optimized Monte Carlo (GOMC) simulation
engine. They can be accessed using the following accessor functions:
Load the GOMC benzene dataset. |
Simple TI and FEP¶
The data sets contain derivatives of the Hamiltonian (TI) and free energy perturbation (FEP) data suitable for processing with FEP estimators as well as BAR/MBAR. Individual \(\lambda\) windows were run independently.
GOMC: Benzene in water¶
Hydration free energy of benzene using TraPPE-EH model and SPC water model.
Notes¶
- Data Set Characteristics:
- Number of Legs
2 (Coulomb, VDW)
- Number of Windows
7 for Coulomb, 15 for VDW
- Length of Windows
50 million Monte Carlo steps
- System Size
1001 molecules
- Temperature
298 K
- Pressure
1 bar
- Alchemical Pathway
vacuum –> vdw –> vdw + coul
- Experimental Hydration Free Energy
-0.90 +- 0.2 kcal/mol
- Missing Values
None
- Energy unit
kJ/mol
- Time unit
Monte Carlo steps
- Creator
M. Soroush Barhaghi
- Donor
Mohammad Soroush Barhaghi (m.soroush@wayne.edu)
- Date
July 2019
- License
CC0 Public Domain Dedication
This dataset was generated using GOMC Monte Carlo simulation engine.
Experimental value sourced from [Mobley2013].
- Mobley2013
Mobley, David L. (2013). Experimental and Calculated Small Molecule Hydration Free Energies. UC Irvine: Department of Pharmaceutical Sciences, UCI. Retrieved from: http://escholarship.org/uc/item/6sd403pz