1.5. Handson-Tutorial(v2.0.3)

1.5.1. Workflow of the DeePMD-kit

The workflow of the DeePMD-kit contains three parts:

  1. Data preparation: Training data is generated based on ab-initio calculations and format conversion is performed;

  2. Model training: Prepare input script and then train the DP model using the data and script;

  3. Model application: Use the obtained DP model for MD simulation or model inference.

1.5.2. Example: a gas-phase methane molecule

The following introduces the basic usage of the DeePMD-kit, taking a gas-phase methane molecule as an example.

1.5.2.1. Data preparation

Preparing the training data includes both generating ab-initio training data and converting the data format, which is the first step of training a DP model with DeePMD-kit.

Training data is often generated using ab-initio molecular dynamics (AIMD) simulations and needs to be converted to a format that can be used directly by DeePMD-kit.

The files needed for this tutorial are available.

$ wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/CH4.tar
$ tar xvf CH4.tar

Go to and check the CH4 folder:

$ cd CH4
$ ls
00.data 01.train 02.lmp

There are 3 folders here:

  1. The folder 00.data contains the data

  2. The folder 01.train contains an example input script to train a model with DeePMD-kit

  3. The folder 02.lmp contains the LAMMPS example script for molecular dynamics simulation

AIMD data generation

The training data of the DeePMD-kit contains the atom type, the simulation box, the atom coordinate, the atom force, the system energy, and the virial. A snapshot of a molecular system that has this information is called a frame. A system of data includes many frames that share the same number of atoms and atom types. For example, a molecular dynamics trajectory can be converted into a system of data, with each time step corresponding to a frame in the system.

As this tutorial is about the DeePMD-kit, training data generated by AIMD simulations is provided.

Go to and check the 00.data folder

$ cd 00.data
$ ls 
OUTCAR

The OUTCAR was produced by an AIMD simulation of a gas-phase methane molecule using VASP.

Data format conversion

The DeePMD-kit adopts a compressed data format. All training data should first be converted into this format and can then be used by DeePMD-kit. The data format is explained in detail in the DeePMD-kit manual that can be found in the ‘data’ Section of DeePMD-kit’s documentation.

We provide a convenient tool named dpdata for converting the data produced by VASP, Gaussian, Quantum-Espresso, ABACUS, and LAMMPS into the compressed format of DeePMD-kit. For details about dpdata, see dpdata’s documentation.

Users can install dpdata via

$ git clone https://github.com/deepmodeling/dpdata.git dpdata  
$ cd dpdata  
$ python setup.py install

or

$ pip install dpdata

Data format conversion using dpdata can be completed in two steps: load data and dump data. Now start an interactive python environment, for example

$ python

then execute the following commands:

import dpdata 
import numpy as np
data = dpdata.LabeledSystem('OUTCAR', fmt = 'vasp/outcar') 
print('# the data contains %d frames' % len(data))

On the screen, you can see that the OUTCAR file contains 200 frames of data. We randomly pick 40 frames as validation data and the rest as training data.

# random choose 40 index for validation_data
index_validation = np.random.choice(200,size=40,replace=False)     
# other indexes are training_data
index_training = list(set(range(200))-set(index_validation))       
data_training = data.sub_system(index_training)
data_validation = data.sub_system(index_validation)
# all training data put into directory:"training_data" 
data_training.to_deepmd_npy('training_data')               
# all validation data put into directory:"validation_data"
data_validation.to_deepmd_npy('validation_data')           
print('# the training data contains %d frames' % len(data_training)) 
print('# the validation data contains %d frames' % len(data_validation)) 

The commands import a system of data from the OUTCAR (with format vasp/outcar), and then dump it into the compressed format (numpy compressed arrays).

Now users have completed the data conversion. The data in DeePMD-kit format is stored in the folder 00.data. Let’s have a look:

$ ls 
OUTCAR training_data validation_data

The directories “training_data” and “validation_data” have a similar structure, so we just explain “training_data”:

$ ls training_data
set.000 type.raw type_map.raw
  1. set.000 is a directory, containing data in compressed format (numpy compressed arrays).

  2. type.raw is a file, containing types of atoms(Represented in integer)

  3. type_map.raw is a file, containing the type name of atoms.

Lets have a look at type.raw:

$ cat training_data/type.raw 
0 0 0 0 1

This tells us there are 5 atoms in this example, 4 atoms represented by type “0”, and 1 atom represented by type “1”. Sometimes users needs to map the integer types to atom name. The mapping can be given by the file type_map.raw

$ cat training_data/type_map.raw 
H C

This tells us the type “0” is named by “H”, and the type “1” is named by “C”.

1.5.2.2. input script

Once the data preparation is done, we can go on with training. Now go to the training directory

$ cd ../01.train
$ ls 
input.json

where input.json gives you an example training script. Users can specify the training process by specifying the value of keywords in input.json. The keywords are explained in detail in the DeePMD-kit manual, so they are not comprehensively explained here.

The keywords in input.json can be divided into 4 sections

  • Model: define the descriptor that maps atomic configuration to a set of symmetry invariant features, and the fitting net that takes descriptor as input and predicts the atomic contribution to the target physical property;

  • Learning rate: define the start learning rate, stop learning rate, decays steps, etc.

  • Loss function: define the type of loss, prefactor of energy, force and virial, etc.

  • Training: define the path of the training dataset and validation dataset, training steps, etc.

Model

The model keywords are given in the following:

"model":{
    "type_map":    ["H", "C"],                           
    "descriptor":{
        "type":            "se_e2_a",                    
        "rcut":            6.00,                        
        "rcut_smth":       0.50,                         
        "sel":             [4, 1],                       
        "neuron":          [10, 20, 40],                 
        "resnet_dt":       false,
        "axis_neuron":     4,                            
        "seed":            1,
        "_comment":        "that's all"
    },
    "fitting_net":{
        "neuron":          [100, 100, 100],   
        "resnet_dt":       true,
        "seed":            1,
        "_comment":        "that's all"
    },
    "_comment":    "that's all"
},

Description of keywords:

keywords

type

Description

type_map

list

Give the name to each type of atoms.

descriptor

dict

The descriptor of atomic environment.

    type

str

The type of the descritpor.

    sel

list

sel_a[i] gives the selected number of type-i neighbors.

    rcut

float

The cut-off radius.

    rcut_smth

float

Where to start smoothing. For example the 1/r term is smoothed from rcut to rcut_smth

    neuron

list

Number of neurons in each hidden layers of embedding net.

    axis_neuron

int

Size of the submatrix of G (embedding matrix)

    seed

int

Random seed for parameter initialization.

fitting_net

dict

The fitting of physical properties.

    neuron

list

Number of neurons in each hidden layers of fitting net.

Description of example: The se_e2_a descriptor is used to train the DP model. The cut-off radius is set to 6 Å and the components in \(\tilde{\mathcal{R}}^{i}\) smoothly go to zero from 0.5 to 6 Å. Within the cut-off radius, the local environment of the H-atom is determined by 4 nearest-neighbour, and the local environment of the C-atom is determined by 1 nearest neighbour atom. The size of the embedding and fitting network to [10, 20, 40] and [100, 100, 100], respectively.

Learning rate

The learning_rate keywords are given in the following:

"learning_rate" :{
    "type":                "exp",
    "decay_steps":         5000,
    "start_lr":            0.001,    
    "stop_lr":             3.51e-8,
    "_comment":            "that's all"
},

Description of keywords:

keywords

type

Description

learning_rate

dict

The definition of learning rate.

    type

str

The type of the learning rate.

    decay_steps

int

The learning rate is decaying every this number of training steps.

    start_lr

float

The learning rate the start of the training.

    stop_lr

float

The desired learning rate at the end of the training.

Description of example:

During the training, the learning rate decays exponentially from start_lr to stop_lr The starting learning rate, stop learning rate, and decay steps are set to 0.001, 3.51e-8, and 5000, respectively.

Loss

The loss keywords are given in the following:

"loss" :{
    "type":                "ener",
    "start_pref_e":        0.02,
    "limit_pref_e":        1,
    "start_pref_f":        1000,
    "limit_pref_f":        1,
    "start_pref_v":        0,
    "limit_pref_v":        0,
    "_comment":            "that's all"
},

Description of keywords:

keywords

type

Description

loss

dict

The definition of loss function.

    type

str

The type of the loss.

    start_pref_e

float

The prefactor of energy loss at the start of the training.

    limit_pref_e

float

The prefactor of energy loss at the limit of the training.

    start_pref_f

float

The prefactor of force loss at the start of the training.

    limit_pref_f

float

The prefactor of force loss at the limit of the training.

    start_pref_v

float

The prefactor of virial loss at the start of the training.

    limit_pref_v

float

The prefactor of virial loss at the limit of the training.

Description of example:

The loss function of the DeePMD-kit is determined by weighting the force, energy, and virial. In the loss function, pref_e increases from 0.02 to 1, and pref_f decreases from 1000 to 1 progressively, which means that the force term dominates at the beginning, while energy and virial terms become important at the end. This strategy is very effective and reduces the total training time. pref_v is set to 0, indicating that no virial data are included in the training process.

Training

The training keywords are given in the following

"training" : {
	"training_data": {
		"systems":            ["../00.data/training_data"],     
		"batch_size":         "auto",                       
		"_comment":           "that's all"
	},
	"validation_data":{
		"systems":            ["../00.data/validation_data/"],
		"batch_size":         "auto",               
		"numb_btch":          1,
		"_comment":           "that's all"
	},
	"numb_steps":             100000,                           
	"seed":                   10,
	"disp_file":              "lcurve.out",
	"disp_freq":              1000,
	"save_freq":              10000,
}

Description of keywords:

keywords

type

Description

training

dict

The definition of training.

    training_data

dict

Configurations of training data.

        systems

str

The data systems for training.

        batch_size

list, str, or int

str “auto”: automatically determines the batch size so that the batch_size times the number of atoms in the system is no less than 32.

    validation_data

dict

Configurations of validation data.

        numb_btch

int

An integer that specifies the number of systems to be sampled for each validation period.

    numb_steps

int

Number of training batch. Each training uses one batch of data.

    disp_file

str

The file for printing learning curve.

    disp_freq

int

The frequency of printing learning curve.

    save_freq

int

The frequency of saving check point

Description of example:

During the training, the training data is at “../00.data/validation_data/”, and validation data is at “../00.data/validation_data/”. The model is trained for \(10^6\) steps. The learning curve is written to the lcurve.out every 1000 steps, and the model-related files are saved every 10000 steps.

1.5.2.3. Train process

The following describes the training process of the DP model using the DeePMD-kit.

Start training Users can start the training with DeePMD-kit by simply running

$ dp train input.json

On the screen, you see the information of the data system(s)

DEEPMD INFO      ----------------------------------------------------------------------------------------------------
DEEPMD INFO      ---Summary of DataSystem: training     -------------------------------------------------------------
DEEPMD INFO      found 1 system(s):
DEEPMD INFO                              system        natoms        bch_sz        n_bch          prob        pbc
DEEPMD INFO           ../00.data/training_data/             5             7           22         1.000          T
DEEPMD INFO      -----------------------------------------------------------------------------------------------------
DEEPMD INFO      ---Summary of DataSystem: validation   --------------------------------------------------------------
DEEPMD INFO      found 1 system(s):
DEEPMD INFO                               system       natoms        bch_sz        n_bch          prob        pbc
DEEPMD INFO          ../00.data/validation_data/            5             7            5         1.000          T

and the starting and final learning rate of this training

DEEPMD INFO      start training at lr 1.00e-03 (== 1.00e-03), decay_step 5000, decay_rate 0.950006, final lr will be 3.51e-08

If everything works fine, you will see, on the screen, information printed every 1000 steps, like

DEEPMD INFO    batch    1000 training time 7.61 s, testing time 0.01 s
DEEPMD INFO    batch    2000 training time 6.46 s, testing time 0.01 s
DEEPMD INFO    batch    3000 training time 6.50 s, testing time 0.01 s
DEEPMD INFO    batch    4000 training time 6.44 s, testing time 0.01 s
DEEPMD INFO    batch    5000 training time 6.49 s, testing time 0.01 s
DEEPMD INFO    batch    6000 training time 6.46 s, testing time 0.01 s
DEEPMD INFO    batch    7000 training time 6.24 s, testing time 0.01 s
DEEPMD INFO    batch    8000 training time 6.39 s, testing time 0.01 s
DEEPMD INFO    batch    9000 training time 6.72 s, testing time 0.01 s
DEEPMD INFO    batch   10000 training time 6.41 s, testing time 0.01 s
DEEPMD INFO    saved checkpoint model.ckpt

They present the training and testing time counts. At the end of the 10000th batch, the model is saved in Tensorflow’s checkpoint file model.ckpt. At the same time, the training and testing errors are presented in file lcurve.out.

Users can check the lcurve.out using the cat command after training

$ cat lcurve.out
#step       rmse_val       rmse_trn       rmse_e_val       rmse_e_trn       rmse_f_val       rmse_f_trn           lr
      0     1.34e+01       1.47e+01         7.05e-01         7.05e-01         4.22e-01         4.65e-01     1.00e-03
    ...
 999000     1.24e-01       1.12e-01         5.93e-04         8.15e-04         1.22e-01         1.10e-01      3.7e-08
1000000     1.31e-01       1.04e-01         3.52e-04         7.74e-04         1.29e-01         1.02e-01      3.5e-08

The lcurve.out contains 8 columns, from left to right, which are the training step, the validation loss, training loss, root mean square (RMS) validation error of energy, RMS training error of energy, RMS validation error of force, RMS training error of force and the learning rate. The RMS error (RMSE) of the energy is normalized by the number of atoms in the system. It is demonstrated that after \(10^6\) steps of training, the energy testing error is less than 1 meV and the force testing error is around 120 meV/Å. It is also observed that the force testing error is systematically (but slightly) larger than the training error, which implies a slight over-fitting to the rather small dataset.

One can visualize this file by a simple Python script:

import numpy as np
import matplotlib.pyplot as plt

data = np.genfromtxt("lcurve.out", names=True)
for name in data.dtype.names[1:-1]:
    plt.plot(data['step'], data[name], label=name)
plt.legend()
plt.xlabel('Step')
plt.ylabel('Loss')
plt.xscale('symlog')
plt.yscale('log')
plt.grid()
plt.show()

During training, the model is saved in the TensorFlow model.ckpt* file every 10,000 steps, and the name of the last saved model is recorded in the checkpoint file.

When the training process is stopped abnormally, we can restart the training from the provided checkpoint by simply running

$ dp train  --restart model.ckpt  input.json

In the lcurve.out, you can see the training and testing errors, like

538000      3.12e-01       2.16e-01         6.84e-04         7.52e-04         1.38e-01         9.52e-02      4.1e-06
538000      3.12e-01       2.16e-01         6.84e-04         7.52e-04         1.38e-01         9.52e-02      4.1e-06
539000      3.37e-01       2.61e-01         7.08e-04         3.38e-04         1.49e-01         1.15e-01      4.1e-06
# step      rmse_val       rmse_trn       rmse_e_val       rmse_e_trn       rmse_f_val       rmse_f_trn           lr
530000      2.89e-01       2.15e-01         6.36e-04         5.18e-04         1.25e-01         9.31e-02      4.4e-06
531000      3.46e-01       3.26e-01         4.62e-04         6.73e-04         1.49e-01         1.41e-01      4.4e-06

Note that the input.json needs to be consistent with the previous one.

Freeze the model At the end of the training, the model parameters saved in TensorFlow’s checkpoint file should be frozen as a model file that is usually ended with extension .pb.

Simply execute

$ dp freeze -o graph.pb

where -o means output, which is used to specify the name of the output model. On the screen, you can see

DEEPMD INFO    Restoring parameters from ./model.ckpt-1000000
DEEPMD INFO    1264 ops in the final graph

and it will output a model file named graph.pb in the current directory.

Compress the model The compressed DP model typically speeds up DP-based calculations by an order of magnitude faster, and consumes an order of magnitude less memory. For a detailed description please refer to the literature.

The graph.pb can be compressed in the following way:

$ dp compress -i graph.pb -o graph-compress.pb

where -i means input, which is used to import unfrozen model. On the screen you can see

DEEPMD INFO    stage 1: compress the model
DEEPMD INFO    built lr
DEEPMD INFO    built network
DEEPMD INFO    built training
DEEPMD INFO    initialize model from scratch
DEEPMD INFO    finished compressing
DEEPMD INFO    
DEEPMD INFO    stage 2: freeze the model
DEEPMD INFO    Restoring parameters from model-compression/model.ckpt
DEEPMD INFO    840 ops in the final graph

and it will output a model file named graph-compress.pb.

Test the model Users can check the quality of the trained model by running

$ dp test -m graph-compress.pb -s ../00.data/validation_data -n 40 -d results

where -m means model which is used to import the model file, -s means system which specifies the path of the test dataset, -n means number which specifies the number of frames to be tested, and -d means detail which writes the details of energy, force and virial to different files.

On the screen, users can see the information on the prediction errors of validation data

DEEPMD INFO    # number of test data    : 40 
DEEPMD INFO    Energy RMSE              : 3.168050e-03 eV
DEEPMD INFO    Energy RMSE/Natoms       : 6.336099e-04 eV
DEEPMD INFO    Force  RMSE              : 1.267645e-01 eV/A
DEEPMD INFO    Virial RMSE              : 2.494163e-01 eV
DEEPMD INFO    Virial RMSE/Natoms       : 4.988326e-02 eV
DEEPMD INFO    # ----------------------------------------------- 

and it will output files named results.e.out and results.f.out in the current directory.

1.5.2.4. Model application

Run MD with LAMMPS

Users can use the DP model for MD simulations. Now let’s switch to 02.lmp folder to check the necessary input files for running MD with LAMMPS.

$ cd ../02.lmp

Firstly, we soft-link the output model in the training directory to the current directory

$ ln -s ../01.train/graph-compress.pb

Then we have three files

$ ls
conf.lmp  graph-compress.pb  in.lammps

where conf.lmp gives the initial configuration of a gas phase methane MD simulation, and the file in.lammps is the LAMMPS input script. One may check in.lammps and finds that it is a rather standard LAMMPS input file for a MD simulation, with only two exception lines:

pair_style  graph-compress.pb
pair_coeff  * *

where the pair style deepmd is invoked and the model file graph-compress.pb is provided, which means the atomic interaction will be computed by the DP model that is stored in the file graph-compress.pb.

One may execute lammps in the standard way

$ lmp  -i  in.lammps

After waiting for a while, the MD simulation finishes, and the log.lammps and ch4.dump files are generated. They store thermodynamic information and the trajectory of the molecule, respectively.

Model inference

Users can use the python interface of DeePMD-kit for model inference.

For example, users can use the DP model to evaluate the energy of each frame in the LAMMPS trajectory. Now start an interactive python environment,

$ python

then execute the following commands:

import dpdata  
d=dpdata.System('ch4.dump', fmt = "lammps/dump", type_map = ["H", "C"])  
d1 = d.predict(dp = "./graph-compress.pb")  
print(d1["energies"])

and it will print the energy of each snapshot on the screen

1.5.3. Summary

Now, users have learned the basic usage of the DeePMD-kit. For further information, please refer to the recommended links.

  1. GitHub website:https://github.com/deepmodeling/

  2. Documentations:https://docs.deepmodeling.com/

  3. Tutorials:https://tutorials.deepmodeling.com/

  4. Papers:https://deepmodeling.com/blog/papers/deepmd-kit/