Dataset Format

`stru_out`

This file contains structure and k-mesh data. It should contain the following data in order

lattice vectors: 3 lines, 3 float number in each line, unit: Bohr radius
reciprocal lattice vectors: 3 lines, 3 float number in each line, unit: inverse Bohr radius
number of k-grids along each lattice vectors: 1 line, nkx, nky, nkz. The total number of k-points nkpts equals to the product of nkx, nky and nkz
Cartesian coordinates of each k-point: nkpts lines, 3 float number in each line, unit: inverse Bohr radius
mapping of k-point to its irreducible counterpart: nkpts lines, 1 integer in each line.

The mapping should be considered as below: suppose the number on the n-th line is m, it means that the irreducible k-point corresponding to the n-th k-point in the full k-point set is the m-th k-point in the full set.

`Cs_data_xxx.txt`

These files contain the localized RI triple coefficients.

In plain text format, each file has a header with two integers: total number of atoms and number of periodic unit cells. Then till the end of file, the data is formatted as blocks of RI coefficient \(C\) on each pair of atoms and unit cell

i_atom_1  i_atom_2  n_1  n_2  n_3  n_basis_1  n_basis_2  n_aux_basis_1
C(1, 1, 1)
...
C(n_aux_basis_1, n_basis_2, n_basis_1)

Here C is the RI coefficients between the atom i_atom_1 and i_atom_2 in unit cells separated by lattice vector \(\mathbf{R} = n_1 \mathbf{a}_1 + n_2 \mathbf{a}_2 + n_3 \mathbf{a}_3\). The auxiliary basis is located on i_atom_1. The number of basis functions on i_atom_1 and i_atom_2 are n_basis_1 and n_basis_2, respectively. The number of auxiliary functions is n_aux_basis_1. The indices of C runs in the Fortran order, i. e. the first index runs the fastest.

In binary format, the data is organized similarly in the plain text format, except for an extra integer is included in the header, which is the number of atom pairs and lattice vectors included in the file. The coefficients are saved in double precision. To better illustrate the format of binary file, the following Python snippet could be helpful

import struct
import numpy as np

# ensure that "Cs_data_0" exists and was generated with binary output mode in DFT code
cfile_path = "Cs_data_0.txt"

with open(cfile_path, 'rb') as h:
    n_atoms, n_cells, n_apcell_file = struct.unpack('iii', h.read(12))
    for _ in range(n_apcell_file):
        a1, a2, r1, r2, r3, nb1, nb2, nbb1 = struct.unpack('i' * 8, h.read(4 * 8))
        apcell = (a1, a2, r1, r2, r3)
        array_size = nb1 * nb2 * nbb1
        array = np.array(struct.unpack('d' * array_size, h.read(8 * array_size)))
        array = np.reshape(array, (nb1, nb2, nbb1))
        apcells[apcell] = array

`band_out`

This file contains band energies and occupation numbers from the mean-field starting-point calculation. It has a 5-line header

n_k_points
n_spins
n_states
n_basis
e_fermi

The first 4 lines contain an integer in each. The 5th line is a float number, which is the Fermi energy in Hartree unit.

The remaining lines consists of n_k_points*n_spins blocks of n_states+1 lines, in the format of

i_k_point    i_spin
1           f_1        e_1_ha      e_1_ev
2           f_2        e_2_ha      e_2_ev
3           f_3        e_3_ha      e_3_ev
...
n           f_n        e_n_ha      e_n_ev
...

This block contains the energies and occupation numbers of states \(\left|\psi_{n,k\sigma}\right\rangle\) i_k_point marks the index of k-point \(k\) in the full k-point set. i_spin specify the spin channel \(\sigma\). In each of the following lines, the first integer species the index of state. The 3 float numbers stand for the occupation number, the energy in Hartree unit and that in electronvolt unit, respectively. For spin-unpolarized calculation, f_n is a number from 0 to 2, otherwise it is from 0 to 1.

`KS_eigenvector_xxx.txt`

These files contain the wave functions (eigenvectors) from the starting-point calculation expanded by orbital basis. Each file can be divided in blocks of n_states*n_basis*n_spins+1 lines, where n_states, n_basis and n_spins will be extracted from band_out file. Each block stores the data for a particular k-point, \(c^i_{n,k\sigma}\):

i_k_point
c(1,1,1)_real c(1,1,1)_imag
...
c(i,n,s)_real c(i,n,s)_imag
...

The first line contains single integer, the index of the k-point of following data. The remaining lines store the data with running index \(i\), \(n\), \(\sigma\) in C-style row-major order, i. e., spin index runs fastest, then state index and finally basis index. Each line has two float numbers, which are the real and imaginary part of \(c^i_{n,k\sigma}\).

`coulomb_mat_xxx.txt`

These files contains the Coulomb matrices in auxiliary basis. A single header line contains an integer, the number of irreducible k-point at which the Coulomb matrices are computed. The remaining part of the file is organized in blocks

n_aux_basis    row_start    row_end    col_start    col_end
i_k_point      k_weight
v(row_start, col_start  )_real       v(row_start, col_start  )_imag
v(row_start, col_start+1)_real       v(row_start, col_start+1)_imag
...
v(row_end, col_end)_real             v(row_end, col_end)_imag

where

integer n_aux_basis is the total number of auxiliary basis functions.
integer row_start, row_end, col_start and col_end mark the submatrix of the full Coulomb matrix that this block contain.
integer i_k_point is the index of k-point of the current Coulomb matrix, in the full k-point list.
float number k_weight is the weight of the irreducible k-points.

After the block header, there should be (row_end-row_start+1) times (col_end-col_start+1) lines for the actual matrix element data. Each line contains two float numbers, which are the real and imaginary parts of the element. The data is ordered in C-style row major.

Dataset Format

stru_out

Cs_data_xxx.txt

band_out

KS_eigenvector_xxx.txt

coulomb_mat_xxx.txt

`stru_out`

`Cs_data_xxx.txt`

`band_out`

`KS_eigenvector_xxx.txt`

`coulomb_mat_xxx.txt`