# 2.2. Hands-on tutorial for DP-GEN (v0.10.3)

## 2.2.1. Workflow of the DP-GEN

DeeP Potential GENerator (DP-GEN) is a package that implements a concurrent learning scheme to generate reliable DP models. Typically, the DP-GEN workflow contains three processes: init, run, and autotest.

1. init: generate the initial training dataset by first-principle calculations.

2. run: the main process of DP-GEN, in which the training dataset is enriched and the quality of the DP models is improved automatically.

3. autotest: calculate a simple set of properties and/or perform tests for comparison with DFT and/or empirical interatomic potentials.

This tutorial aims to help you quickly get command of the run process, so only a brief introduction to the init and auto-test processes is offered.

## 2.2.2. Example: a gas phase methane molecule

The following introduces the basic usage of the DP-GEN, taking a gas-phase methane molecule as an example.

### 2.2.2.1. Init

The initial dataset is used to train multiple (default 4) initial DP models and it can be generated in a custom way or in the standard way provided by DP-GEN.

Custom way

Performing ab-initio molecular dynamics (AIMD) simulations directly is a common custom way of generating initial data. The following suggestions are given to users who generate initial data via AIMD simulation:

• Performing AIMD simulations at higher temperatures.

• Start AIMD simulations from several (as many as possible) unrelated initial configurations.

• Save snapshots from AIMD trajectories at a time interval to avoid sampling highly-related configurations.

Standard way of DP-GEN

For block materials, the initial data can be generated using DP-GE’s init_bulk method. In the init_bulk method, the given configuration is initially relaxed by ab-initio calculation and subsequently scaled or perturbed. Next, these scaled or perturbed configurations are used to start small-scale AIMD simulations, and the AIMD format data is finally converted to the data format required by DeePMD-kit. Basically, init_bulk can be divided into four parts:

1. Relax in folder 00.place_ele

2. Perturb and scale in folder 01.scale_pert

3. Run a short AIMD in folder 02.md

4. Collect data in folder 02.md.

For surface systems, the initial data can be generated using DP-GE’s init_surf method. Basically init_surf can be divided into two parts:

1. Build a specific surface in folder 00.place_ele

2. Perturb and scale in folder 01.scale_pert

Above steps are carried out automatically when generating the initial data in the standard way of DP-GEN. Users only need to prepare the input files for ab-initio calculation and DP-GEN (param.json and machine.json).

When generating the initial data for the block materials in the standard way, execute the following command:

$dpgen init_bulk param.json machine.json  For surface systems, execute $ dpgen init_surf param.json machine.json


A detailed description for preparing initial data in the standard way can be found at ‘Init’ Section of the DP-GEN’s documentation.

Initial data of this tutorial

In this tutorial, we take a gas-phase methane molecule as an example. We have prepared initial data in dpgen_example/init. Now download the dpgen_example and uncompress it:

wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/dpgen_example.tar.xz
tar xvf dpgen_example.tar.xz


Go to and check the dpgen_example folder

$cd dpgen_example$ ls
init run

• Folder init contains the initial data generated by AIMD simulations.

• Folder run contains input files for the run process.

First, check the init folder with the tree command.



#### 2.2.2.2.4. Results analysis

Users need to know the output files of the run process and the information they contain. After successfully executing the above command, we can find that a folder and two files are generated automatically in dpgen_example/run.

$ls dpgen.log INCAR_methane iter.000000 machine.json param.json record.dpgen  • iter.000000 contains the main results that DP-GEN generates in the first iteration. • record.dpgen records the current stage of the run process. • dpgen.log includes time and iteration information. When the first iteration is completed, the folder structure of iter.000000 is like this: $ tree iter.000000/ -L 1
./iter.000000/
├── 00.train
├── 01.model_devi
└── 02.fp

• 00.train: several (default 4) DP models are trained on existing data.

• 01.model_devi: new configurations are generated using the DP models obtained in 00.train.

• 02.fp: first-principles calculations are performed on the selected configurations and the results are converted into training data.

00.train First, we check the folder iter.000000/ 00.train.

$tree iter.000000/00.train -L 1 ./iter.000000/00.train/ ├── 000 ├── 001 ├── 002 ├── 003 ├── data.init -> /root/dpgen_example ├── data.iters ├── graph.000.pb -> 000/frozen_model.pb ├── graph.001.pb -> 001/frozen_model.pb ├── graph.002.pb -> 002/frozen_model.pb └── graph.003.pb -> 003/frozen_model.pb  • Folder 00x contains the input and output files of the DeePMD-kit, in which a model is trained. • graph.00x.pb , linked to 00x/frozen.pb, is the model DeePMD-kit generates. The only difference between these models is the random seed for neural network initialization. We may randomly select one of them, like 000. $ tree iter.000000/00.train/000 -L 1
./iter.000000/00.train/000
├── checkpoint
├── frozen_model.pb
├── input.json
├── lcurve.out
├── model.ckpt-400000.data-00000-of-00001
├── model.ckpt-400000.index
├── model.ckpt-400000.meta
├── model.ckpt.data-00000-of-00001
├── model.ckpt.index
├── model.ckpt.meta
└── train.log

• input.json is the settings for DeePMD-kit for the current task.

• checkpoint is used for restart training.

• model.ckpt* are model related files.

• frozen_model.pb is the frozen model.

• lcurve.out records the training accuracy of energies and forces.

• train.log includes version, data, hardware information, time, etc.

01.model_devi Then, we check the folder iter.000000/ 01.model_devi.

$tree iter.000000/01.model_devi -L 1 ./iter.000000/01.model_devi/ ├── confs ├── graph.000.pb -> /root/dpgen_example/run/iter.000000/00.train/graph.000.pb ├── graph.001.pb -> /root/dpgen_example/run/iter.000000/00.train/graph.001.pb ├── graph.002.pb -> /root/dpgen_example/run/iter.000000/00.train/graph.002.pb ├── graph.003.pb -> /root/dpgen_example/run/iter.000000/00.train/graph.003.pb ├── task.000.000000 ├── task.000.000001 ├── task.000.000002 ├── task.000.000003 ├── task.000.000004 ├── task.000.000005 ├── task.000.000006 ├── task.000.000007 ├── task.000.000008 └── task.000.000009  • Folder confs contains the initial configurations for LAMMPS MD converted from POSCAR you set in “sys_configs” of param.json. • Folder task.000.00000x contains the input and output files of the LAMMPS. We may randomly select one of them, like task.000.000001. $ tree iter.000000/01.model_devi/task.000.000001
├── conf.lmp -> ../confs/000.0001.lmp
├── input.lammps
├── log.lammps
├── model_devi.log
└── model_devi.out

• conf.lmp, linked to 000.0001.lmp in folder confs, serves as the initial configuration of MD.

• input.lammps is the input file for LAMMPS.

• model_devi.out records the model deviation of concerned labels, energy and force, in MD. It serves as the criterion for selecting which structures and doing first-principle calculations.

By head model_devi.out, you will see:

$head -n 5 ./iter.000000/01.model_devi/task.000.000001/model_devi.out # step max_devi_v min_devi_v avg_devi_v max_devi_f min_devi_f avg_devi_f 0 1.438427e-04 5.689551e-05 1.083383e-04 8.835352e-04 5.806717e-04 7.098761e-04 10 3.887636e-03 9.377374e-04 2.577191e-03 2.880724e-02 1.329747e-02 1.895448e-02 20 7.723417e-04 2.276932e-04 4.340100e-04 3.151907e-03 2.430687e-03 2.727186e-03 30 4.962806e-03 4.943687e-04 2.925484e-03 5.866077e-02 1.719157e-02 3.011857e-02  Now we’ll concentrate on max_devi_f. Recall that we’ve set "trj_freq" as 10, so every 10 steps the structures are saved. Whether to select the structure depends on its "max_devi_f". If it falls between "model_devi_f_trust_lo" (0.05) and "model_devi_f_trust_hi" (0.15), DP-GEN will treat the structure as a candidate. Here, only the 30th structure will be selected, whose "max_devi_f" is 5.866077e e-02. 02.fp Finally, we check the folder iter.000000/ 02.fp. $ tree iter.000000/02.fp -L 1
./iter.000000/02.fp
├── data.000
├── candidate.shuffled.000.out
├── POTCAR.000
├── rest_accurate.shuffled.000.out
└── rest_failed.shuffled.000.out

• POTCAR is the input file for VASP generated according to "fp_pp_files" of param.json.

• candidate.shuffle.000.out records which structures will be selected from last step 01.model_devi. There are always far more candidates than the maximum you expect to calculate at one time. In this condition, DP-GEN will randomly choose up to "fp_task_max" structures and form the folder task.*.

• rest_accurate.shuffle.000.out records the other structures where our model is accurate (“max_devi_f” is less than "model_devi_f_trust_lo", no need to calculate any more),

• rest_failed.shuffled.000.out records the other structures where our model is too inaccurate (larger than "model_devi_f_trust_hi", there may be some error).

• data.000: After first-principle calculations, DP-GEN will collect these data and change them into the format DeePMD-kit needs. In the next iteration’s 00.train, these data will be trained together as well as the initial data.

By cat candidate.shuffled.000.out | grep task.000.000001, you will see:

$cat ./iter.000000/02.fp/candidate.shuffled.000.out | grep task.000.000001 iter.000000/01.model_devi/task.000.000001 190 iter.000000/01.model_devi/task.000.000001 130 iter.000000/01.model_devi/task.000.000001 120 iter.000000/01.model_devi/task.000.000001 150 iter.000000/01.model_devi/task.000.000001 280 iter.000000/01.model_devi/task.000.000001 110 iter.000000/01.model_devi/task.000.000001 30 iter.000000/01.model_devi/task.000.000001 230  The task.000.000001 30 is exactly what we have just found in 01.model_devi satisfying the criterion to be calculated again. After the first iteration, we check the contents of dpgen.log and record.dpgen. $ cat dpgen.log
2022-03-07 22:12:45,447 - INFO : start running
2022-03-07 22:12:45,447 - INFO : =============================iter.000000==============================
2022-03-07 22:12:45,447 - INFO : -------------------------iter.000000 task 00--------------------------
2022-03-07 22:12:45,451 - INFO : -------------------------iter.000000 task 01--------------------------
2022-03-08 00:53:00,179 - INFO : -------------------------iter.000000 task 02--------------------------
2022-03-08 00:53:00,179 - INFO : -------------------------iter.000000 task 03--------------------------
2022-03-08 00:53:00,187 - INFO : -------------------------iter.000000 task 04--------------------------
2022-03-08 00:57:04,113 - INFO : -------------------------iter.000000 task 05--------------------------
2022-03-08 00:57:04,113 - INFO : -------------------------iter.000000 task 06--------------------------
2022-03-08 00:57:04,123 - INFO : system 000 candidate :     12 in    310   3.87 %
2022-03-08 00:57:04,125 - INFO : system 000 failed    :      0 in    310   0.00 %
2022-03-08 00:57:04,125 - INFO : system 000 accurate  :    298 in    310  96.13 %
2022-03-08 00:57:04,126 - INFO : system 000 accurate_ratio:   0.9613    thresholds: 1.0000 and 1.0000   eff. task min and max   -1   20   number of fp tasks:     12
2022-03-08 00:57:04,154 - INFO : -------------------------iter.000000 task 07--------------------------
2022-03-08 01:02:07,925 - INFO : -------------------------iter.000000 task 08--------------------------
2022-03-08 01:02:07,926 - INFO : failed tasks:      0 in     12    0.00 %
2022-03-08 01:02:07,949 - INFO : failed frame:      0 in     12    0.00 %


It can be found that 310 structures are generated in iter.000000, in which 12 structures are collected for first-principle calculations.

$cat record.dpgen 0 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8  Each line contains two numbers: the first is the index of iteration, and the second, ranging from 0 to 9, records which stage in each iteration is currently running. Index of iterations “Stage in each iteration “ Process 0 0 make_train 0 1 run_train 0 2 post_train 0 3 make_model_devi 0 4 run_model_devi 0 5 post_model_devi 0 6 make_fp 0 7 run_fp 0 8 post_fp If the process of DP-GEN stops for some reason, DP-GEN will automatically recover the main process by record.dpgen. You may also change it manually for your purpose, such as removing the last iterations and recovering from one checkpoint. After all iterations, we check the structure of dpgen_example/run $ tree ./ -L 2
./
├── dpgen.log
├── INCAR_methane
├── iter.000000
│   ├── 00.train
│   ├── 01.model_devi
│   └── 02.fp
├── iter.000001
│   ├── 00.train
│   ├── 01.model_devi
│   └── 02.fp
├── iter.000002
│   └── 00.train
├── machine.json
├── param.json
└── record.dpgen


and contents of dpgen.log.

\$ cat cat dpgen.log | grep system
2022-03-08 00:57:04,123 - INFO : system 000 candidate :     12 in    310   3.87 %
2022-03-08 00:57:04,125 - INFO : system 000 failed    :      0 in    310   0.00 %
2022-03-08 00:57:04,125 - INFO : system 000 accurate  :    298 in    310  96.13 %
2022-03-08 00:57:04,126 - INFO : system 000 accurate_ratio:   0.9613    thresholds: 1.0000 and 1.0000   eff. task min and max   -1   20   number of fp tasks:     12
2022-03-08 03:47:00,718 - INFO : system 001 candidate :      0 in   3010   0.00 %
2022-03-08 03:47:00,718 - INFO : system 001 failed    :      0 in   3010   0.00 %
2022-03-08 03:47:00,719 - INFO : system 001 accurate  :   3010 in   3010 100.00 %
2022-03-08 03:47:00,722 - INFO : system 001 accurate_ratio:   1.0000    thresholds: 1.0000 and 1.0000   eff. task min and max   -1    0   number of fp tasks:      0


It can be found that 3010 structures are generated in iter.000001, in which no structure is collected for first-principle calculations. Therefore, the final models are not updated in iter.000002/00.train.

### 2.2.2.3. Auto-test

To verify the accuracy of the DP model, users can calculate a simple set of properties and compare the results with those of a DFT or traditional empirical force field. DPGEN’s autotest module supports the calculation of a variety of properties, such as

• 00.equi:(default task) the equilibrium state；

• 01.eos: the equation of state；

• 02.elastic: the elasticity like Young’s module；

• 03.vacancy: the vacancy formation energy；

• 04.interstitial: the interstitial formation energy；

• 05.surf: the surface formation energy.

## 2.2.3. Summary

Now, users have learned the basic usage of the DP-GEN. For further information, please refer to the recommended links.

1. GitHub website：https://github.com/deepmodeling/dpgen

2. Papers：https://deepmodeling.com/blog/papers/dpgen/