This is part of the pamm module
	It is only available if you configure PLUMED with ./configure –enable-modules=pamm . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list.

Probabilistic analysis of molecular motifs.

Probabilistic analysis of molecular motifs (PAMM) was introduced in this paper [pamm]. The essence of this approach involves calculating some large set of collective variables for a set of atoms in a short trajectory and fitting this data using a Gaussian Mixture Model. The idea is that modes in these distributions can be used to identify features such as hydrogen bonds or secondary structure types.

The assumption within this implementation is that the fitting of the Gaussian mixture model has been done elsewhere by a separate code. You thus provide an input file to this action which contains the means, covariance matrices and weights for a set of Gaussian kernels, \(\{ \phi \}\). The values and derivatives for the following set of quantities is then computed:

\[ s_k = \frac{ \phi_k}{ \sum_i \phi_i } \]

Each of the \(\phi_k\) is a Gaussian function that acts on a set of quantities calculated within a MultiColvar . These might be TORSIONS, DISTANCES, ANGLES or any one of the many symmetry functions that are available within MultiColvar actions. These quantities are then inserted into the set of \(n\) kernels that are in the the input file. This will be done for multiple sets of values for the input quantities and a final quantity will be calculated by summing the above \(s_k\) values or some transformation of the above. This sounds less complicated than it is and is best understood by looking through the example given below.

Warning: Mixing MultiColvar actions that are periodic with variables that are not periodic has not been tested

Examples

In this example I will explain in detail what the following input is computing:

Click on the labels of the actions for more information on what each action computes

#SETTINGS MOLFILE=regtest/basic/rt32/helix.pdb
MOLINFO MOLTYPEcompulsory keyword ( default=protein )
what kind of molecule is contained in the pdb file - usually not needed since protein/RNA/DNA
are compatible 
=protein STRUCTUREcompulsory keyword 
a file in pdb format containing a reference structure. 
=M1d.pdb  You cannot view the components that are calculated by each action for this input file. Sorry 
psi: TORSIONS ATOMS1 could not find this keyword 
=@psi-2 ATOMS2 could not find this keyword 
=@psi-3 ATOMS3 could not find this keyword 
=@psi-4  You cannot view the components that are calculated by each action for this input file. Sorry 
phi: TORSIONS ATOMS1 could not find this keyword 
=@phi-2 ATOMS2 could not find this keyword 
=@phi-3 ATOMS3 could not find this keyword 
=@phi-4  You cannot view the components that are calculated by each action for this input file. Sorry 
p: PAMM DATA could not find this keyword 
=phi,psi CLUSTERScompulsory keyword 
the name of the file that contains the definitions of all the clusters 
=clusters.pamm MEAN1( default=off ) calculate the mean of all the quantities. 
={COMPONENT=1}  MEAN2( default=off ) calculate the mean of all the quantities. 
={COMPONENT=2}   You cannot view the components that are calculated by each action for this input file. Sorry 
PRINT ARGthe input for this action is the scalar output from one or more other actions. 
=p.mean-1,p.mean-2 FILEthe name of the file on which to output these quantities 
=colvar  You cannot view the components that are calculated by each action for this input file. Sorry

The best place to start our explanation is to look at the contents of the clusters.pamm file

#! FIELDS height phi psi sigma_phi_phi sigma_phi_psi sigma_psi_phi sigma_psi_psi
#! SET multivariate von-misses
#! SET kerneltype gaussian
      2.97197455E-0001     -1.91983118E+0000      2.25029540E+0000      2.45960237E-0001     -1.30615381E-0001     -1.30615381E-0001      2.40239117E-0001
      2.29131448E-0002      1.39809354E+0000      9.54585380E-0002      9.61755708E-0002     -3.55657919E-0002     -3.55657919E-0002      1.06147253E-0001
      5.06676398E-0001     -1.09648066E+0000     -7.17867907E-0001      1.40523052E-0001     -1.05385552E-0001     -1.05385552E-0001      1.63290557E-0001

This files contains the parameters of two two-dimensional Gaussian functions. Each of these Gaussian kernels has a weight, \(w_k\), a vector that specifies the position of its center, \(\mathbf{c}_k\), and a covariance matrix, \(\Sigma_k\). The \(\phi_k\) functions that we use to calculate our PAMM components are thus:

\[ \phi_k = \frac{w_k}{N_k} \exp\left( -(\mathbf{s} - \mathbf{c}_k)^T \Sigma^{-1}_k (\mathbf{s} - \mathbf{c}_k) \right) \]

In the above \(N_k\) is a normalization factor that is calculated based on \(\Sigma\). The vector \(\mathbf{s}\) is a vector of quantities that are calculated by the TORSIONS actions. This vector must be two dimensional and in this case each component is the value of a torsion angle. If we look at the two TORSIONS actions in the above we are calculating the \(\phi\) and \(\psi\) backbone torsional angles in a protein (Note the use of MOLINFO to make specification of atoms straightforward). We thus calculate the values of our 2 \( \{ \phi \} \) kernels 3 times. The first time we use the \(\phi\) and \(\psi\) angles in the second residue of the protein, the second time it is the \(\phi\) and \(\psi\) angles of the third residue of the protein and the third time it is the \(\phi\) and \(\psi\) angles of the fourth residue in the protein. The final two quantities that are output by the print command, p.mean-1 and p.mean-2, are the averages over these three residues for the quantities:

\[ s_1 = \frac{ \phi_1}{ \phi_1 + \phi_2 } \]

and

\[ s_2 = \frac{ \phi_2}{ \phi_1 + \phi_2 } \]

There is a great deal of flexibility in this input. We can work with, and examine, any number of components, we can use any set of collective variables and compute these PAMM variables and we can transform the PAMM variables themselves in a large number of different ways when computing these sums.

Glossary of keywords and components

Description of components

Quantity	Keyword	Description
lessthan	LESS_THAN	the number of colvars that have a value less than a threshold
morethan	MORE_THAN	the number of colvars that have a value more than a threshold
altmin	ALT_MIN	the minimum value of the cv
min	MIN	the minimum colvar
max	MAX	the maximum colvar
between	BETWEEN	the number of colvars that have a value that lies in a particular interval
highest	HIGHEST	the largest of the colvars
lowest	LOWEST	the smallest of the colvars
sum	SUM	the sum of the colvars
mean	MEAN	the mean of the colvars

Compulsory keywords

ARG	the vectors from which the pamm coordinates are calculated
CLUSTERS	the name of the file that contains the definitions of all the clusters
REGULARISE	( default=0.001 ) don't allow the denominator to be smaller then this value
KERNELS	( default=all ) which kernels are we computing the PAMM values for

Options

HIGHEST	( default=off ) this flag allows you to recover the highest of these variables.
LOWEST	( default=off ) this flag allows you to recover the lowest of these variables.
SUM	( default=off ) calculate the sum of all the quantities.
MEAN	( default=off ) calculate the mean of all the quantities.
LESS_THAN	calculate the number of variables that are less than a certain target value. This quantity is calculated using \(\sum_i \sigma(s_i)\), where \(\sigma(s)\) is a switchingfunction.. You can use multiple instances of this keyword i.e. LESS_THAN1, LESS_THAN2, LESS_THAN3...
MORE_THAN	calculate the number of variables that are more than a certain target value. This quantity is calculated using \(\sum_i 1 - \sigma(s_i)\), where \(\sigma(s)\) is a switchingfunction.. You can use multiple instances of this keyword i.e. MORE_THAN1, MORE_THAN2, MORE_THAN3...
ALT_MIN	calculate the minimum value. To make this quantity continuous the minimum is calculated using \( \textrm{min} = -\frac{1}{\beta} \log \sum_i \exp\left( -\beta s_i \right) \) The value of \(\beta\) in this function is specified using (BETA= \(\beta\)).
MIN	calculate the minimum value. To make this quantity continuous the minimum is calculated using \( \textrm{min} = \frac{\beta}{ \log \sum_i \exp\left( \frac{\beta}{s_i} \right) } \) The value of \(\beta\) in this function is specified using (BETA= \(\beta\))
MAX	calculate the maximum value. To make this quantity continuous the maximum is calculated using \( \textrm{max} = \beta \log \sum_i \exp\left( \frac{s_i}{\beta}\right) \) The value of \(\beta\) in this function is specified using (BETA= \(\beta\))
BETWEEN	calculate the number of values that are within a certain range. These quantities are calculated using kernel density estimation as described on histogrambead.. You can use multiple instances of this keyword i.e. BETWEEN1, BETWEEN2, BETWEEN3...
HISTOGRAM	calculate a discretized histogram of the distribution of values. This shortcut allows you to calculates NBIN quantites like BETWEEN.