The purpose of this module is to provide an introduction to the specific requirements of protein crystallography and to give the students their first exposure to crystallographic computing.
At the end of the module, you will be able to prepare crystals, process data and solve structures by molecular replacement.
Basic theory, physical methods and strategy involved in the crystallisation of biological macromolecules including factors affecting crystallization
Broad overview of topics covered by Susan and Trevor plus:
Assessment of this module will take place at the end of the Protein Crystallography module in October.
PsaA (pneumococcal surface antigen A) is a extracellular ABC-type transporter protein involved in transporting zinc and/or manganese into the Gram-positive Streptococcus pneumonia (Lawrence et al. 1998, Pilling et al., 1998). The molecular weight of the protein is about 35 kDa. Oscillation data have been collected from needle-like crystals of the protein grown from via phosphate precipitation. The aim of this tutorial is to use this data set to illustrate the basics of oscillation data processing.
In order to do this tutorial you will need access to a computer containing the data and the HKL program suite (Otwinowski & Minor, 1997), as well as to a for this software.
The data set consist of 98 one-degree oscillation images collected on a Rigaku R-axis IV image plate detector mounted on a laboratory rotating anode X-ray source. The data set is stored in the directory $PSAA_DATA/native and the respective file names are of the form psa13###.osc, where ### is a three-digit number ranging from 001 to 098.
Individual images can be displayed on your computer using the HKL program
xdisp as follows:
xdisp raxis4 100 psa13###.osc
Try this with one of the images. Then acquaint yourself with the various modes of display, in particular colour / mono, altering the contrast (colour is most useful), zooming in. Note that you can zoom in right down to the pixel level and see the actual counts recorded for each pixel.
These images have rather high background and you may need to alter the contrast range to see the diffraction spots at different resolution ranges.
The aim of "indexing" is to assign a Miller index (hkl value) to every spot in the diffraction pattern. This can be done in a semi-automatic fashion (as opposed to earlier manual methods), hence the term "auto-indexing". HKL auto-indexing proceeds via a Fourier-space type analysis of the layout of the diffraction spots, effectively looking for the most common spacings between the spots and seeing if these can be assigned in a consistent way to the same Bravais lattice.
Auto-indexing is done by the HKL program
denzo needs some critical information about the diffractometer set-up used to collect the diffraction data – these are provided in the control file
auto.dat – this file is located in the directory $PSAA_DATA/native. This file has the following components in it:-
denzointo auto-indexing mode,
denzomust also be supplied with the name of a file (typically
peaks.file) containing the x-y coordinates of the strongest spots in the image.
Auto-indexing will produce the following information:
This information can then be used to predict (via Bragg's law) which reflections will occur on the image and where they will be located on the image. The information can also be used to predict the partiality of each reflection, i.e. the extent to which the reflection is fully recorded on the image – effectively the extent to which it has totally crossed the Ewald sphere during the oscillation sweep. (Remember that the spots have a finite, non-zero angular width as a result of the mosaic spread of the crystal).
Once the lattice is assigned, the entire geometry of the system (cell dimensions, crystal orientation etc.) can be refined by a process of minimizing the overall discrepancy between the position of spots predicted by the Ewald sphere construction and their actual location on the image plate. In particular, this procedure (termed "refinement") leads to very accurate values for the
Denzo does not refine the mosaic spread of the crystal or select the correct Bravais lattice to describe the data, this has to be done manually. The mosaic spread is correctly set when there is an optimal balance between the number of reflections predicted and the number of reflections observed.
xdispto pick some of the stronger spots (aim to get at least 50).
auto.datfile - make certain that it reflects the data collection environment.
This will execute the auto.dat script and should display a list of the fit between the each Bravais lattice and the observed set of peaks. The correct lattice will usually be that of highest symmetry that still gives a reasonable fit (say less than 2-3 % error).
Ctrl-C) and enter any space group of the correct lattice into the auto.dat file by adding the following line before the "peak file" command:
denzoand feed in the modified auto.dat file via the @ command as above. Note that at this stage auto-indexing is only providing the lattice. Determination of the correct space group will take place later. At this stage it is useful to enter a spacegroup that does not contain screw operators.
fit cell crystal rotx roty rotz x beam y beam cassette rotx roty go go go go go go write predictions
This set of commands will the do six cycles of refinement of the cell dimensions, the crystal orientation settings, the beam coordinates and the camera rotation.
Examine the chi2 values for the spot positions and for the partiality estimates (these should be less than about 2.0 if the auto-indexing is correct and the diffraction pattern is well-defined). These numbers are found in lines in the output of the form
position 373 chi**2 x 1.90 y 1.70 pred. decrease 0.000 x 373 = 0.000 partiality766 chi**2 1.02 pred.decrease 0.000 x 766 = 0.000
update predictions. You should see the predicted pattern displayed. If all is well it should be a very close match to the observed pattern. Check also carefully to see whether there are systematically too many or too few spots – this would also indicate that the selected lattice was incorrect.
mosaicity 0.5say to change the mosaicity to 0.5 degrees, and re-do some fitting via the go command.
Once auto-indexing is complete, the refined geometry values can then used to predict the reflections that occur image by image by image through the entire data set. The intensity of each reflection is then measured using the so-called profile fitting method. Profile fitting involves determining the average shape of the stronger reflections and then fitting this shape to each reflection in turn – the observed intensity of each reflection will then be proportional to the scale factor that has to be applied to this shape in order to scale it to the particular reflection intensity. Profile fitting is a superior technique to simple integration (pixel-by-pixel summation) of the spot intensity, particularly in the case of weak reflections.
The result of this procedure will be a set of files of the form
psa13###.x, these contain the profile-fitted intensities of each reflection occurring on each particular image, as well as further information relating both to the position of the spot and its degree of partiality. At the end of each *.x file is a summary of the refined parameters for the detector and crystal as determined from that particular image.
denzo < refine.dat | tee refine.log
The results are conveniently written to a log file as well as to the standard output via the tee pipe fitting. If xdisp is left running at the same time as denzo, then the image display will be updated each time denzo progresses to the next refinement cycle or to the next image.
If the nothing unforeseen has happened to the crystal during data collection and the pattern remains strong from image to image then the refinement should proceed without problem. A quick scan of the log file should be able to check that there is no major increase in the difference between the predicted and actual spot positions as the refinement progresses from image to image. A scan of the "*.x" files will reveal whether or not there is any drift in cell dimensions or crystal orientation or beam position, all of which would be indicative of a problem in the refinement. Such checks can readily be constructed with a
This is the most critical part of the process and involves combining all the data to give a set of average intensities for each unique reflection in the asymmetric unit of the point group. Within HKL, the process also provides information about the space group itself, as well as highly accurate cell dimensions and valuable statistics relating to the quality of the entire data set. The key steps undertaken during the scale and merge process are as follows
The scaling, merging and post-refinement is carried out by the HKL program
scalepack is supplied with the set of *.x files and processes these file via the above procedures to produce a single output file with individual intensities for each unique reflection in the Laue group.
Examine the file scale1.in and see how it reflects the scaling process and the data.
Edit it to include the best estimate of the available resolution as well as the selected space group.
(In this case it is best to provide scalepack with a space group that contains as many screw axes as possible as it will then provide information about the intensities of forbidden reflections. Once the correct space group is determined from this information, select it and go back and run scalepack with the correct space group)
To run scalepack:
scalepack < scale1.in > scale1.log
Examine the log file carefully and note where each of the above processes are undertaken and the statistics they produces in the process. Note the rejection of outlier measurements – these are written to a file called reject. You need to completely understand the output of this program.
Then re-run scalepack as follows:
scalepack < scale2.in > scale2.log
scale2.in should be identical to scale1.in except for the first line, which instructs scalepack to read in the rejection file produced by the first run. The idea here is that more accurate scale factors can be obtained once the outliers are rejected before scaling takes place. This process can then be repeated as desired until the scale factors become stable and no further outliers are rejected. If large variation is seen in the cell or crystal setting parameters or mosaicity, then it may be necessary to re-run the integration step, supplying
denzo with the post-refined values for these parameters.