GULP

Code details

The original version of GULP [115] was written in Fortran 77 since the more recent standards had yet to be released. This implied that memory was statically allocated via a series of parameters. Subsequently, non-standard extensions were introduced to allow the second derivative arrays to be dynamically declared, since they represented the dominant use of memory. For the current version Fortran 90 has now been adopted leading to full use of dynamic memory.

The program has been compiled and tested for most Unix-style operating systems, including Linux and Apple-Macintosh OSX, using most Fortran 90 compilers. While compilation under MS-DOS is in principle possible, this operating system is not supported since it is the only operating system that cannot be automatically catered for within a single standard Makefile.

The code has also been parallelised for the evaluation of the energy and first derivatives using MPI, based upon a replicated data paradigm. When performing calculations on sufficiently large systems that require the use of parallel computers, then the most appropriate types of calculation are usually either conjugate gradient optimisation or molecular dynamics. Hence, the absence of second derivatives is less critical. However, a distributed data algorithm for the second derivatives and using Scalapack for matrix diagonalisation/inversion would be feasible and may be implemented in the future. Because GULP is currently targetted primarily at crystalline systems, where unit cells are typically small, the distribution of parallel work does not use a spatial decomposition. Instead the Brode-Ahlrichs algorithm [116] is used for the pairwise loops in real space in order to try to ensure load balancing over the processors. A similar approach is used for the four-body potentials based on the first two atoms of the sequence of four. In the case of three-body potentials, the work is divided by a straight distribution of pivot atoms over the nodes. Parallelisation in reciprocal space is achieved by a equal division of reciprocal lattice vectors over nodes. Given that the number of operations per $k$-vector is equal, this should guarantee load balancing as long as the number of reciprocal lattice vectors is large compared with the number of processors (which is almost always the case).