This is part of the isdb module

Calculate the fit of a structure or ensemble of structures with a cryo-EM density map.

This action implements the multi-scale Bayesian approach to cryo-EM data fitting introduced in Ref. [48] . This method allows efficient and accurate structural modeling of cryo-electron microscopy density maps at multiple scales, from coarse-grained to atomistic resolution, by addressing the presence of random and systematic errors in the data, sample heterogeneity, data correlation, and noise correlation.

The experimental density map is fit by a Gaussian Mixture Model (GMM), which is provided as an external file specified by the keyword GMM_FILE. We are currently working on a web server to perform this operation. In the meantime, the user can request a stand-alone version of the GMM code at massimiliano.bonomi_AT_gmail.com.

When run in single-replica mode, this action allows atomistic, flexible refinement of an individual structure into a density map. Combined with a multi-replica framework (such as the -multi option in GROMACS), the user can model an ensemble of structures using the Metainference approach [21] .

Warning: To use EMMI, the user should always add a MOLINFO line and specify a pdb file of the system.

Note: To enhance sampling in single-structure refinement, one can use a Replica Exchange Method, such as Parallel Tempering. In this case, the user should add the NO_AVER flag to the input line. To use a replica-based enhanced sampling scheme such as Parallel-Bias Metadynamics (PBMETAD), one should use the REWEIGHT flag and pass the Metadynamics bias using the ARG keyword.; EMMI can be used in combination with periodic and non-periodic systems. In the latter case, one should add the NOPBC flag to the input line

Examples

In this example, we perform a single-structure refinement based on an experimental cryo-EM map. The map is fit with a GMM, whose parameters are listed in the file GMM_fit.dat. This file contains one line per GMM component in the following format:

Click on the labels of the actions for more information on what each action computes

#! FIELDS Id Weight Mean_0 Mean_1 Mean_2 Cov_00 Cov_01 Cov_02 Cov_11 Cov_12 Cov_22 Beta
0 2.9993805e+01 could not find this keyword 
 6.54628 could not find this keyword 
 10.37820 could not find this keyword 
 -0.92988 could not find this keyword 
 2.078920e-02 could not find this keyword 
 1.216254e-03 could not find this keyword 
 5.990827e-04 could not find this keyword 
 2.556246e-02 could not find this keyword 
 8.411835e-03 could not find this keyword 
 2.486254e-02 could not find this keyword 
 1 could not find this keyword 
  You cannot view the components that are calculated by each action for this input file. Sorry 
1 2.3468312e+01 could not find this keyword 
 6.56095 could not find this keyword 
 10.34790 could not find this keyword 
 -0.87808 could not find this keyword 
 1.879859e-02 could not find this keyword 
 6.636049e-03 could not find this keyword 
 3.682865e-04 could not find this keyword 
 3.194490e-02 could not find this keyword 
 1.750524e-03 could not find this keyword 
 3.017100e-02 could not find this keyword 
  You cannot view the components that are calculated by each action for this input file. Sorry 
@newline  You cannot view the components that are calculated by each action for this input file. Sorry

To accelerate the computation of the Bayesian score, one can:

use neighbor lists, specified by the keywords NL_CUTOFF and NL_STRIDE;
calculate the restraint every other step (or more).

All the heavy atoms of the system are used to calculate the density map. This list can conveniently be provided using a GROMACS index file.

The input file looks as follows:

Click on the labels of the actions for more information on what each action computes

# include pdb info
MOLINFO STRUCTUREcompulsory keyword
a file in pdb format containing a reference structure.
=prot.pdb You cannot view the components that are calculated by each action for this input file. Sorry
# all heavy atoms
protein-h: GROUP NDX_FILEthe name of index file (gromacs syntax)
=index.ndx NDX_GROUPthe name of the group to be imported (gromacs syntax) - first group found is used
by default
=Protein-H You cannot view the components that are calculated by each action for this input file. Sorry
# create EMMI score
gmm: EMMI NOPBC( default=off ) ignore the periodic boundary conditions when calculating distances
SIGMA_MINcompulsory keyword
minimum uncertainty
=0.01 TEMPtemperature
=300.0 NL_STRIDEcompulsory keyword
The frequency with which we are updating the neighbor list
=100 NL_CUTOFFcompulsory keyword
The cutoff in overlap for the neighbor list
=0.01 GMM_FILEcompulsory keyword
file with the parameters of the GMM components
=GMM_fit.dat ATOMSatoms for which we calculate the density map, typically all heavy atoms.
=protein-h You cannot view the components that are calculated by each action for this input file. Sorry
# translate into bias - apply every 2 steps
emr: BIASVALUE ARGthe input for this action is the scalar output from one or more other actions.
=gmm.scoreb STRIDEthe frequency with which the forces due to the bias should be calculated.
=2 You cannot view the components that are calculated by each action for this input file. Sorry
PRINT ARGthe input for this action is the scalar output from one or more other actions.
=emr.* FILEthe name of the file on which to output these quantities
=COLVAR STRIDEcompulsory keyword ( default=1 )
the frequency with which the quantities of interest should be output
=500 FMTthe format that should be used to output real numbers
=%20.10f You cannot view the components that are calculated by each action for this input file. Sorry

Glossary of keywords and components

Description of components

By default this Action calculates the following quantities. These quantities can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.

Quantity	Description
scoreb	Bayesian score
neff	effective number of replicas

In addition the following quantities can be calculated by employing the keywords listed below

Quantity	Keyword	Description
acc	NOISETYPE	MC acceptance for uncertainty
scale	REGRESSION	scale factor
accscale	REGRESSION	MC acceptance for scale regression
enescale	REGRESSION	MC energy for scale regression
anneal	ANNEAL	annealing factor
weight	REWEIGHT	weights of the weighted average
biasDer	REWEIGHT	derivatives with respect to the bias
sigma	NOISETYPE	uncertainty in the forward models and experiment

The atoms involved can be specified using

ATOMS

atoms for which we calculate the density map, typically all heavy atoms. For more information on how to specify lists of atoms see Groups and Virtual Atoms

Compulsory keywords

GMM_FILE	file with the parameters of the GMM components
NL_CUTOFF	The cutoff in overlap for the neighbor list
NL_STRIDE	The frequency with which we are updating the neighbor list
SIGMA_MIN	minimum uncertainty
RESOLUTION	Cryo-EM map resolution
NOISETYPE	functional form of the noise (GAUSS, OUTLIERS, MARGINAL)

Options

NUMERICAL_DERIVATIVES	( default=off ) calculate the derivatives for these quantities numerically
NOPBC	( default=off ) ignore the periodic boundary conditions when calculating distances
NO_AVER	( default=off ) don't do ensemble averaging in multi-replica mode
REWEIGHT	( default=off ) simple REWEIGHT using the ARG as energy
ARG	the input for this action is the scalar output from one or more other actions. The particular scalars that you will use are referenced using the label of the action. If the label appears on its own then it is assumed that the Action calculates a single scalar value. The value of this scalar is thus used as the input to this new action. If * or . appears the scalars calculated by all the proceeding actions in the input file are taken. Some actions have multi-component outputs and each component of the output has a specific label. For example a DISTANCE action labelled dist may have three components x, y and z. To take just the x component you should use dist.x, if you wish to take all three components then use dist.*.More information on the referencing of Actions can be found in the section of the manual on the PLUMED Getting Started. Scalar values can also be referenced using POSIX regular expressions as detailed in the section on Regular Expressions. To use this feature you you must compile PLUMED with the appropriate flag.. You can use multiple instances of this keyword i.e. ARG1, ARG2, ARG3...
SIGMA0	initial value of the uncertainty
DSIGMA	MC step for uncertainties
MC_STRIDE	Monte Carlo stride
ERR_FILE	file with experimental or GMM fit errors
OV_FILE	file with experimental overlaps
NORM_DENSITY	integral of the experimental density
STATUS_FILE	write a file with all the data useful for restart
WRITE_STRIDE	write the status to a file every N steps, this can be used for restart
REGRESSION	regression stride
REG_SCALE_MIN	regression minimum scale
REG_SCALE_MAX	regression maximum scale
REG_DSCALE	regression maximum scale MC move
SCALE	scale factor
ANNEAL	Length of annealing cycle
ANNEAL_FACT	Annealing temperature factor
TEMP	temperature
PRIOR	exponent of uncertainty prior
WRITE_OV_STRIDE	write model overlaps every N steps
WRITE_OV	write a file with model overlaps
AVERAGING	Averaging window for weights