1.5. Handson-Tutorial(v2.0.3)
1.5.1. Workflow of the DeePMD-kit
The workflow of the DeePMD-kit contains three parts:
Data preparation: Training data is generated based on ab-initio calculations and format conversion is performed;
Model training: Prepare input script and then train the DP model using the data and script;
Model application: Use the obtained DP model for MD simulation or model inference.
1.5.2. Example: a gas-phase methane molecule
The following introduces the basic usage of the DeePMD-kit, taking a gas-phase methane molecule as an example.
1.5.2.1. Data preparation
Preparing the training data includes both generating ab-initio training data and converting the data format, which is the first step of training a DP model with DeePMD-kit.
Training data is often generated using ab-initio molecular dynamics (AIMD) simulations and needs to be converted to a format that can be used directly by DeePMD-kit.
The files needed for this tutorial are available.
$ wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/CH4.tar
$ tar xvf CH4.tar
Go to and check the CH4 folder:
$ cd CH4
$ ls
00.data 01.train 02.lmp
There are 3 folders here:
The folder 00.data contains the data
The folder 01.train contains an example input script to train a model with DeePMD-kit
The folder 02.lmp contains the LAMMPS example script for molecular dynamics simulation
AIMD data generation
The training data of the DeePMD-kit contains the atom type, the simulation box, the atom coordinate, the atom force, the system energy, and the virial. A snapshot of a molecular system that has this information is called a frame. A system of data includes many frames that share the same number of atoms and atom types. For example, a molecular dynamics trajectory can be converted into a system of data, with each time step corresponding to a frame in the system.
As this tutorial is about the DeePMD-kit, training data generated by AIMD simulations is provided.
Go to and check the 00.data folder
$ cd 00.data
$ ls
OUTCAR
The OUTCAR was produced by an AIMD simulation of a gas-phase methane molecule using VASP.
Data format conversion
The DeePMD-kit adopts a compressed data format. All training data should first be converted into this format and can then be used by DeePMD-kit. The data format is explained in detail in the DeePMD-kit manual that can be found in the ‘data’ Section of DeePMD-kit’s documentation.
We provide a convenient tool named dpdata for converting the data produced by VASP, Gaussian, Quantum-Espresso, ABACUS, and LAMMPS into the compressed format of DeePMD-kit. For details about dpdata, see dpdata’s documentation.
Users can install dpdata via
$ git clone https://github.com/deepmodeling/dpdata.git dpdata
$ cd dpdata
$ python setup.py install
or
$ pip install dpdata
Data format conversion using dpdata can be completed in two steps: load data and dump data. Now start an interactive python environment, for example
$ python
then execute the following commands:
import dpdata
import numpy as np
data = dpdata.LabeledSystem('OUTCAR', fmt = 'vasp/outcar')
print('# the data contains %d frames' % len(data))
On the screen, you can see that the OUTCAR file contains 200 frames of data. We randomly pick 40 frames as validation data and the rest as training data.
# random choose 40 index for validation_data
index_validation = np.random.choice(200,size=40,replace=False)
# other indexes are training_data
index_training = list(set(range(200))-set(index_validation))
data_training = data.sub_system(index_training)
data_validation = data.sub_system(index_validation)
# all training data put into directory:"training_data"
data_training.to_deepmd_npy('training_data')
# all validation data put into directory:"validation_data"
data_validation.to_deepmd_npy('validation_data')
print('# the training data contains %d frames' % len(data_training))
print('# the validation data contains %d frames' % len(data_validation))
The commands import a system of data from the OUTCAR (with format vasp/outcar), and then dump it into the compressed format (numpy compressed arrays).
Now users have completed the data conversion. The data in DeePMD-kit format is stored in the folder 00.data. Let’s have a look:
$ ls
OUTCAR training_data validation_data
The directories “training_data” and “validation_data” have a similar structure, so we just explain “training_data”:
$ ls training_data
set.000 type.raw type_map.raw
set.000 is a directory, containing data in compressed format (numpy compressed arrays).
type.raw is a file, containing types of atoms(Represented in integer)
type_map.raw is a file, containing the type name of atoms.
Lets have a look at type.raw
:
$ cat training_data/type.raw
0 0 0 0 1
This tells us there are 5 atoms in this example, 4 atoms represented by type “0”, and 1 atom represented by type “1”. Sometimes users needs to map the integer types to atom name. The mapping can be given by the file type_map.raw
$ cat training_data/type_map.raw
H C
This tells us the type “0” is named by “H”, and the type “1” is named by “C”.
1.5.2.2. input script
Once the data preparation is done, we can go on with training. Now go to the training directory
$ cd ../01.train
$ ls
input.json
where input.json gives you an example training script. Users can specify the training process by specifying the value of keywords in input.json. The keywords are explained in detail in the DeePMD-kit manual, so they are not comprehensively explained here.
The keywords in input.json can be divided into 4 sections
Model: define the descriptor that maps atomic configuration to a set of symmetry invariant features, and the fitting net that takes descriptor as input and predicts the atomic contribution to the target physical property;
Learning rate: define the start learning rate, stop learning rate, decays steps, etc.
Loss function: define the type of loss, prefactor of energy, force and virial, etc.
Training: define the path of the training dataset and validation dataset, training steps, etc.
Model
The model keywords are given in the following:
"model":{
"type_map": ["H", "C"],
"descriptor":{
"type": "se_e2_a",
"rcut": 6.00,
"rcut_smth": 0.50,
"sel": [4, 1],
"neuron": [10, 20, 40],
"resnet_dt": false,
"axis_neuron": 4,
"seed": 1,
"_comment": "that's all"
},
"fitting_net":{
"neuron": [100, 100, 100],
"resnet_dt": true,
"seed": 1,
"_comment": "that's all"
},
"_comment": "that's all"
},
Description of keywords:
keywords | type | Description |
---|---|---|
type_map | list | Give the name to each type of atoms. |
descriptor | dict | The descriptor of atomic environment. |
type | str | The type of the descritpor. |
sel | list | sel_a[i] gives the selected number of type-i neighbors. |
rcut | float | The cut-off radius. |
rcut_smth | float | Where to start smoothing. For example the 1/r term is smoothed from rcut to rcut_smth |
neuron | list | Number of neurons in each hidden layers of embedding net. |
axis_neuron | int | Size of the submatrix of G (embedding matrix) |
seed | int | Random seed for parameter initialization. |
fitting_net | dict | The fitting of physical properties. |
neuron | list | Number of neurons in each hidden layers of fitting net. |
Description of example: The se_e2_a
descriptor is used to train the DP model. The cut-off radius is set to 6 Å and the components in \(\tilde{\mathcal{R}}^{i}\) smoothly go to zero from 0.5 to 6 Å. Within the cut-off radius, the local environment of the H-atom is determined by 4 nearest-neighbour, and the local environment of the C-atom is determined by 1 nearest neighbour atom. The size of the embedding and fitting network to [10, 20, 40] and [100, 100, 100], respectively.
Learning rate
The learning_rate keywords are given in the following:
"learning_rate" :{
"type": "exp",
"decay_steps": 5000,
"start_lr": 0.001,
"stop_lr": 3.51e-8,
"_comment": "that's all"
},
Description of keywords:
keywords | type | Description |
---|---|---|
learning_rate | dict | The definition of learning rate. |
type | str | The type of the learning rate. |
decay_steps | int | The learning rate is decaying every this number of training steps. |
start_lr | float | The learning rate the start of the training. |
stop_lr | float | The desired learning rate at the end of the training. |
Description of example:
During the training, the learning rate decays exponentially from start_lr to stop_lr The starting learning rate, stop learning rate, and decay steps are set to 0.001, 3.51e-8, and 5000, respectively.
Loss
The loss keywords are given in the following:
"loss" :{
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": "that's all"
},
Description of keywords:
keywords | type | Description |
---|---|---|
loss | dict | The definition of loss function. |
type | str | The type of the loss. |
start_pref_e | float | The prefactor of energy loss at the start of the training. |
limit_pref_e | float | The prefactor of energy loss at the limit of the training. |
start_pref_f | float | The prefactor of force loss at the start of the training. |
limit_pref_f | float | The prefactor of force loss at the limit of the training. |
start_pref_v | float | The prefactor of virial loss at the start of the training. |
limit_pref_v | float | The prefactor of virial loss at the limit of the training. |
Description of example:
The loss function of the DeePMD-kit is determined by weighting the force, energy, and virial. In the loss function, pref_e
increases from 0.02 to 1, and pref_f
decreases from 1000 to 1 progressively, which means that the force term dominates at the beginning, while energy and virial terms become important at the end. This strategy is very effective and reduces the total training time. pref_v
is set to 0, indicating that no virial data are included in the training process.
Training
The training keywords are given in the following
"training" : {
"training_data": {
"systems": ["../00.data/training_data"],
"batch_size": "auto",
"_comment": "that's all"
},
"validation_data":{
"systems": ["../00.data/validation_data/"],
"batch_size": "auto",
"numb_btch": 1,
"_comment": "that's all"
},
"numb_steps": 100000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 1000,
"save_freq": 10000,
}
Description of keywords:
keywords | type | Description |
---|---|---|
training | dict | The definition of training. |
training_data | dict | Configurations of training data. |
systems | str | The data systems for training. |
batch_size | list, str, or int | str “auto”: automatically determines the batch size so that the batch_size times the number of atoms in the system is no less than 32. |
validation_data | dict | Configurations of validation data. |
numb_btch | int | An integer that specifies the number of systems to be sampled for each validation period. |
numb_steps | int | Number of training batch. Each training uses one batch of data. |
disp_file | str | The file for printing learning curve. |
disp_freq | int | The frequency of printing learning curve. |
save_freq | int | The frequency of saving check point |
Description of example:
During the training, the training data is at “../00.data/validation_data/”, and validation data is at “../00.data/validation_data/”. The model is trained for \(10^6\) steps. The learning curve is written to the lcurve.out every 1000 steps, and the model-related files are saved every 10000 steps.
1.5.2.3. Train process
The following describes the training process of the DP model using the DeePMD-kit.
Start training Users can start the training with DeePMD-kit by simply running
$ dp train input.json
On the screen, you see the information of the data system(s)
DEEPMD INFO ----------------------------------------------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: training -------------------------------------------------------------
DEEPMD INFO found 1 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO ../00.data/training_data/ 5 7 22 1.000 T
DEEPMD INFO -----------------------------------------------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: validation --------------------------------------------------------------
DEEPMD INFO found 1 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO ../00.data/validation_data/ 5 7 5 1.000 T
and the starting and final learning rate of this training
DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 5000, decay_rate 0.950006, final lr will be 3.51e-08
If everything works fine, you will see, on the screen, information printed every 1000 steps, like
DEEPMD INFO batch 1000 training time 7.61 s, testing time 0.01 s
DEEPMD INFO batch 2000 training time 6.46 s, testing time 0.01 s
DEEPMD INFO batch 3000 training time 6.50 s, testing time 0.01 s
DEEPMD INFO batch 4000 training time 6.44 s, testing time 0.01 s
DEEPMD INFO batch 5000 training time 6.49 s, testing time 0.01 s
DEEPMD INFO batch 6000 training time 6.46 s, testing time 0.01 s
DEEPMD INFO batch 7000 training time 6.24 s, testing time 0.01 s
DEEPMD INFO batch 8000 training time 6.39 s, testing time 0.01 s
DEEPMD INFO batch 9000 training time 6.72 s, testing time 0.01 s
DEEPMD INFO batch 10000 training time 6.41 s, testing time 0.01 s
DEEPMD INFO saved checkpoint model.ckpt
They present the training and testing time counts. At the end of the 10000th batch, the model is saved in Tensorflow’s checkpoint file model.ckpt
. At the same time, the training and testing errors are presented in file lcurve.out
.
Users can check the lcurve.out using the cat
command after training
$ cat lcurve.out
#step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
0 1.34e+01 1.47e+01 7.05e-01 7.05e-01 4.22e-01 4.65e-01 1.00e-03
...
999000 1.24e-01 1.12e-01 5.93e-04 8.15e-04 1.22e-01 1.10e-01 3.7e-08
1000000 1.31e-01 1.04e-01 3.52e-04 7.74e-04 1.29e-01 1.02e-01 3.5e-08
The lcurve.out contains 8 columns, from left to right, which are the training step, the validation loss, training loss, root mean square (RMS) validation error of energy, RMS training error of energy, RMS validation error of force, RMS training error of force and the learning rate. The RMS error (RMSE) of the energy is normalized by the number of atoms in the system. It is demonstrated that after \(10^6\) steps of training, the energy testing error is less than 1 meV and the force testing error is around 120 meV/Å. It is also observed that the force testing error is systematically (but slightly) larger than the training error, which implies a slight over-fitting to the rather small dataset.
One can visualize this file by a simple Python script:
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt("lcurve.out", names=True)
for name in data.dtype.names[1:-1]:
plt.plot(data['step'], data[name], label=name)
plt.legend()
plt.xlabel('Step')
plt.ylabel('Loss')
plt.xscale('symlog')
plt.yscale('log')
plt.grid()
plt.show()
During training, the model is saved in the TensorFlow model.ckpt* file every 10,000 steps, and the name of the last saved model is recorded in the checkpoint file.
When the training process is stopped abnormally, we can restart the training from the provided checkpoint by simply running
$ dp train --restart model.ckpt input.json
In the lcurve.out, you can see the training and testing errors, like
538000 3.12e-01 2.16e-01 6.84e-04 7.52e-04 1.38e-01 9.52e-02 4.1e-06
538000 3.12e-01 2.16e-01 6.84e-04 7.52e-04 1.38e-01 9.52e-02 4.1e-06
539000 3.37e-01 2.61e-01 7.08e-04 3.38e-04 1.49e-01 1.15e-01 4.1e-06
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
530000 2.89e-01 2.15e-01 6.36e-04 5.18e-04 1.25e-01 9.31e-02 4.4e-06
531000 3.46e-01 3.26e-01 4.62e-04 6.73e-04 1.49e-01 1.41e-01 4.4e-06
Note that the input.json needs to be consistent with the previous one.
Freeze the model At the end of the training, the model parameters saved in TensorFlow’s checkpoint file should be frozen as a model file that is usually ended with extension .pb.
Simply execute
$ dp freeze -o graph.pb
where -o means output, which is used to specify the name of the output model. On the screen, you can see
DEEPMD INFO Restoring parameters from ./model.ckpt-1000000
DEEPMD INFO 1264 ops in the final graph
and it will output a model file named graph.pb
in the current directory.
Compress the model The compressed DP model typically speeds up DP-based calculations by an order of magnitude faster, and consumes an order of magnitude less memory. For a detailed description please refer to the literature.
The graph.pb
can be compressed in the following way:
$ dp compress -i graph.pb -o graph-compress.pb
where -i means input, which is used to import unfrozen model. On the screen you can see
DEEPMD INFO stage 1: compress the model
DEEPMD INFO built lr
DEEPMD INFO built network
DEEPMD INFO built training
DEEPMD INFO initialize model from scratch
DEEPMD INFO finished compressing
DEEPMD INFO
DEEPMD INFO stage 2: freeze the model
DEEPMD INFO Restoring parameters from model-compression/model.ckpt
DEEPMD INFO 840 ops in the final graph
and it will output a model file named graph-compress.pb
.
Test the model Users can check the quality of the trained model by running
$ dp test -m graph-compress.pb -s ../00.data/validation_data -n 40 -d results
where -m means model which is used to import the model file, -s means system which specifies the path of the test dataset, -n means number which specifies the number of frames to be tested, and -d means detail which writes the details of energy, force and virial to different files.
On the screen, users can see the information on the prediction errors of validation data
DEEPMD INFO # number of test data : 40
DEEPMD INFO Energy RMSE : 3.168050e-03 eV
DEEPMD INFO Energy RMSE/Natoms : 6.336099e-04 eV
DEEPMD INFO Force RMSE : 1.267645e-01 eV/A
DEEPMD INFO Virial RMSE : 2.494163e-01 eV
DEEPMD INFO Virial RMSE/Natoms : 4.988326e-02 eV
DEEPMD INFO # -----------------------------------------------
and it will output files named results.e.out and results.f.out in the current directory.
1.5.2.4. Model application
Run MD with LAMMPS
Users can use the DP model for MD simulations. Now let’s switch to 02.lmp folder to check the necessary input files for running MD with LAMMPS.
$ cd ../02.lmp
Firstly, we soft-link the output model in the training directory to the current directory
$ ln -s ../01.train/graph-compress.pb
```sh
Then we have three files
```sh
$ ls
conf.lmp graph-compress.pb in.lammps
where conf.lmp
gives the initial configuration of a gas phase methane MD simulation, and the file in.lammps
is the LAMMPS input script. One may check in.lammps and finds that it is a rather standard LAMMPS input file for a MD simulation, with only two exception lines:
pair_style graph-compress.pb
pair_coeff * *
where the pair style deepmd is invoked and the model file graph-compress.pb
is provided, which means the atomic interaction will be computed by the DP model that is stored in the file graph-compress.pb.
One may execute lammps in the standard way
$ lmp -i in.lammps
After waiting for a while, the MD simulation finishes, and the log.lammps and ch4.dump files are generated. They store thermodynamic information and the trajectory of the molecule, respectively.
Model inference
Users can use the python interface of DeePMD-kit for model inference.
For example, users can use the DP model to evaluate the energy of each frame in the LAMMPS trajectory. Now start an interactive python environment,
$ python
then execute the following commands:
import dpdata
d=dpdata.System('ch4.dump', fmt = "lammps/dump", type_map = ["H", "C"])
d1 = d.predict(dp = "./graph-compress.pb")
print(d1["energies"])
and it will print the energy of each snapshot on the screen
1.5.3. Summary
Now, users have learned the basic usage of the DeePMD-kit. For further information, please refer to the recommended links.
GitHub website:https://github.com/deepmodeling/
Documentations:https://docs.deepmodeling.com/
Tutorials:https://tutorials.deepmodeling.com/
Papers:https://deepmodeling.com/blog/papers/deepmd-kit/