Overlaying Folding Topological Units and Comparing Side Chains
These pictures are an important part of the background of my project. However, they are mind-numbingly repetitive to generate. Can we write a script to do most of the work for us? But of course...
First, to save some time, I have included the files that already have the peptides removed. Download
Now, we'll start using chimera as a module instead of a graphical interface.
import chimera
Then, do something interesting...Loop over the list of structures, open the files, and clean them up. To do the clean up work, we are going to use a Chimera module called 'runCommand', which is the Command Line (from this morning) in the graphical interface.
for tInd, t inenumerate(testset):
rec = chimera.openModels.open('%s.pdb' %t)#clean up structurefrom chimera import runCommand
runCommand('sel protein')
runCommand('del ~sel')
Try running the script using the following commands:
$ chimera --nogui single_res_overlay.py
What happens?
Try running the script using the following commands:
$ chimera single_res_overlay.py
What is the difference?
Now we will color each structure differently...REMEMBER to make sure the commands are tabbed in as we are still in the for loop
#color appropriate color
runCommand('sel #%i' %tInd)
runCommand('color %s sel' %colors[tInd])
Next, we will change the atoms to be displayed as sticks. Also, because we want to emphasize the structure that is not bound to a peptide (1EXR), we will change that structure specifically to be ball-and-stick.
Kind of a mess! We really want to be focusing on a particular residue rather than the entire structure. For this, we will first set up a variable that will set the residue as part of calling the script.
#get residue of interestimportsys
resFoc =int(sys.argv[1])
Then we will superimpose the proper portion of the structure using the 'match' command. This command is very similar to MatchMaker except it only superimposes specified c-alphas of the main chains instead of creating a sequence alignment and then corresponding main chain c-alphas. (The sections of the structures that will be overlaid are the same as those you should have identified in the exercises yesterday.)
for t inrange(1,len(testset)):
#overlay appropriate part of calmodulin structureif resFoc >=4and resFoc <61:
runCommand('match #%i:4-60@CA #0:4-60@CA' %t)elif resFoc >=61and resFoc <79:
runCommand('match #%i:61-78@CA #0:61-78@CA' %t)elif resFoc >=79and resFoc <100:
runCommand('match #%i:79-99@CA #0:79-99@CA' %t)elif resFoc >=100and resFoc <147:
runCommand('match #%i:100-146@CA #0:100-146@CA' %t)else:
print'ERROR: Residue out of range of chain!'sys.exit()
Now, because the script is expecting a user-provided argument, we need to slightly modify the way we call Chimera so that IT knows we are adding arguments:
$ chimera --script single_res_overlay.py 72
Let's actually focus in on the residue of interest and hide everything else.
#isolate and focus on residue of interest
runCommand('sel :%i' %resFoc)
runCommand('~disp ~sel')
runCommand('focus')
runCommand('~sel')
and finally, make the background white for publication
#change background color
runCommand('set bg_color white')
The focus command we just used did zoom in on the residue of interest, but the computer does not know *exactly* which view you want. So, move it around until it looks pretty :O). Then save the image using the pulldown menu.
Calculating Backbone RMSD of Folding Topological Units
Now, let's get Chimera to generate some data for us! Here, we will be overlaying subsections of the structure, then calculating the c-alpha root mean squared deviation for each overlay, and finally printing it out in a table.
First, we'll use the Chimera module to open the reference structure
import chimera
#open and clean up reference structure
aporec = chimera.openModels.open('1EXR_cam.pdb')
Next, we'll clean the structure up a bit. For this portion, we'll use the runCommand module, which is the same as using the Command line in the graphical interface.
from chimera import runCommand
runCommand('sel protein')
runCommand('del ~sel')
In the next stage, we'll need to loop through each of the test set structures, overlay them on the reference structure and then calculate the C-alpha rmsd. First, we'll do the procedure using the interface and structures 1EXR and 1MXE to visualize what is happening. To overlay the structures, we'll run Tools >> Structure Comparison >> MatchMaker to do an overall match....
Next, we'll use the command 'match' to overlay just residues 4-60 (lobe A)...
Finally, we'll use the command 'rmsd' to calculate the C-alpha backbone rmsd, which will be both written to the box below the command line and to the Reply Log...
Now, we'll write some code to do the same thing for the entire test set and for lobe A, lobe B and both connector helices.
ca_lobeA =[]
ca_connectA =[]
ca_connectB =[]
ca_lobeB =[]
testset =['1PRW_cam','1MXE_cam','CAMK2_cam','2O60_cam','INOS20_cam','2F3Y_cam','2O5G_cam']from Midas import rmsd, match
for pInd, p inenumerate(testset):
testrec = chimera.openModels.open('%s.pdb' %p)#clean up structure
runCommand('sel protein')
runCommand('del ~sel')#calculate ca backbone rmsd
match('#0:4-60@CA','#1:4-60@CA')
ca_lobeA.append(rmsd('#0:4-60@CA','#1:4-60@CA'))
match('#0:61-78@CA','#1:61-78@CA')
ca_connectA.append(rmsd('#0:61-78@CA','#1:61-78@CA'))
match('#0:79-99@CA','#1:79-99@CA')
ca_connectB.append(rmsd('#0:79-99@CA','#1:79-99@CA'))
match('#0:100-146@CA','#1:100-146@CA')
ca_lobeB.append(rmsd('#0:100-146@CA','#1:100-146@CA'))
chimera.openModels.close(testrec)
We need to import the rmsd module because the runCommand version doesn't return values. However, there is no cast-in-stone reason to import the match module rather than run them through runCommand. It is just less typing :O)
Try running this code both ways again:
$ chimera rmsd_calculator.py
and
$ chimera --nogui rmsd_calculator.py
Which one works better here?
Finally, we want to print the information out in a nice table that can be imported into excel, origin, or your fav program.
Putting the code snippets together, we can run it as a script without ever opening Chimera.
usage: $chimera --nogui rmsd_calculator.py
Writing a file in mol2 format
As it turns out, writing mol2 formatted files is something that Chimera can already do. However, you occasionally may need to convert a pdb file into some other format, which you can easily do using Chimera. It also gives us the opportunity to explore the structure library in Chimera!
So, here is our goal: Convert the 1EXR_cam.pdb file into
Next, we'll open up an output file and write down the first header:
out =open('1EXR_cam_NEW.mol2','w')
out.write('@<TRIPOS>MOLECULE\n')
In this section of the file, we store information about the molecule...which we need to get! Molecular information is stored in a module called Model. Let's use the IDLE window to find out what is stored in Model...
Based on this information (as well as the additional documentation on the Chimera web site), we can add the rest of the header information
For the next session, we'll be writing out the information about the atoms themselves. For the mol2 format, in addition to the information about the atoms found in the pdb file, we also need to figure out the atom types. As this is not actually critical for this exercise, I am providing a dictionary that will map from the Chimera atom type to the mol2 atom type:
And now let's print out the bond information...NOTE: Bond typing is not automatically performed in Chimera and can be very difficult..and it is outside the scope of this lecture. Therefore, we will be typing everything as a single bond.
out.write('@<TRIPOS>BOND\n')
index =1for b inpdb.bonds:
a1, a2 = b.atoms
out.write('%6s ' %index)
out.write('%4s %4s' %((pdb.atoms.index(a1)+1), \
(pdb.atoms.index(a2)+1)))
out.write('%2s\n' %1)
index +=1
out.write('\n')
out.close()
And that will give us a minimal Mol2 file! Try opening the file up in Chimera and see what happens...
Exercises
1) All Roads Lead to Rome...Again
As we have mentioned several times, there are multiple ways to get the same answer. In Session 3.2 (Loops and Escapes) exercise #3, you wrote a script that reads in a pdb file and calculates the average b-factor for the helical and beta sheet regions. Repeat the exercise using Chimera.
2) Tying It All Together
My algorithm writes out a table with a list of all of the peaks it finds. I have written a script that parses that table to summarize the number of alternate conformations that have been detected for the unbound open structure 1EXR and the unbound close structure 1PRW. Download
and use them to create the following images (below).
a) For the bottom image, use the graphical interface to create the image.
b) For the top image, write a script to create the image. As a first pass, don't worry about the gray colored balls because you don't actually have any information about the alternate conformations.
c) EXTRA CREDIT: The structure files I have given you were simplified to make them easier to use in the lecture. Download the original 1EXR structure from the Protein Databank and add the gray colored balls to your image.
Analyzing Data with Chimera
Using Chimera as a library AND as a graphical interface
Places to go for help
Overlaying Folding Topological Units and Comparing Side Chains
These pictures are an important part of the background of my project. However, they are mind-numbingly repetitive to generate. Can we write a script to do most of the work for us? But of course...
First, to save some time, I have included the files that already have the peptides removed. Download
Next, we'll set up a list of the structures we will be working with and some colors.
Now, we'll start using chimera as a module instead of a graphical interface.
import chimeraThen, do something interesting...Loop over the list of structures, open the files, and clean them up. To do the clean up work, we are going to use a Chimera module called 'runCommand', which is the Command Line (from this morning) in the graphical interface.Try running the script using the following commands:
$ chimera --nogui single_res_overlay.py
What happens?
Try running the script using the following commands:
$ chimera single_res_overlay.py
What is the difference?
Now we will color each structure differently...REMEMBER to make sure the commands are tabbed in as we are still in the for loop
Next, we will change the atoms to be displayed as sticks. Also, because we want to emphasize the structure that is not bound to a peptide (1EXR), we will change that structure specifically to be ball-and-stick.
Run it again and see what things look like....
Kind of a mess! We really want to be focusing on a particular residue rather than the entire structure. For this, we will first set up a variable that will set the residue as part of calling the script.
Then we will superimpose the proper portion of the structure using the 'match' command. This command is very similar to MatchMaker except it only superimposes specified c-alphas of the main chains instead of creating a sequence alignment and then corresponding main chain c-alphas. (The sections of the structures that will be overlaid are the same as those you should have identified in the exercises yesterday.)
Now, because the script is expecting a user-provided argument, we need to slightly modify the way we call Chimera so that IT knows we are adding arguments:
$ chimera --script single_res_overlay.py 72
Let's actually focus in on the residue of interest and hide everything else.
and finally, make the background white for publication
The focus command we just used did zoom in on the residue of interest, but the computer does not know *exactly* which view you want. So, move it around until it looks pretty :O). Then save the image using the pulldown menu.
Calculating Backbone RMSD of Folding Topological Units
Now, let's get Chimera to generate some data for us! Here, we will be overlaying subsections of the structure, then calculating the c-alpha root mean squared deviation for each overlay, and finally printing it out in a table.
First, we'll use the Chimera module to open the reference structure
Next, we'll clean the structure up a bit. For this portion, we'll use the runCommand module, which is the same as using the Command line in the graphical interface.
In the next stage, we'll need to loop through each of the test set structures, overlay them on the reference structure and then calculate the C-alpha rmsd. First, we'll do the procedure using the interface and structures 1EXR and 1MXE to visualize what is happening. To overlay the structures, we'll run Tools >> Structure Comparison >> MatchMaker to do an overall match....
Next, we'll use the command 'match' to overlay just residues 4-60 (lobe A)...
Finally, we'll use the command 'rmsd' to calculate the C-alpha backbone rmsd, which will be both written to the box below the command line and to the Reply Log...
Now, we'll write some code to do the same thing for the entire test set and for lobe A, lobe B and both connector helices.
We need to import the rmsd module because the runCommand version doesn't return values. However, there is no cast-in-stone reason to import the match module rather than run them through runCommand. It is just less typing :O)
Try running this code both ways again:
$ chimera rmsd_calculator.py
and
$ chimera --nogui rmsd_calculator.py
Which one works better here?
Finally, we want to print the information out in a nice table that can be imported into excel, origin, or your fav program.
Putting the code snippets together, we can run it as a script without ever opening Chimera.
usage:
$chimera --nogui rmsd_calculator.py
Writing a file in mol2 format
As it turns out, writing mol2 formatted files is something that Chimera can already do. However, you occasionally may need to convert a pdb file into some other format, which you can easily do using Chimera. It also gives us the opportunity to explore the structure library in Chimera!
So, here is our goal: Convert the 1EXR_cam.pdb file into
Now we'll write some code to actually convert between the two. We'll start by actually opening the pdb file.
Next, we'll open up an output file and write down the first header:
In this section of the file, we store information about the molecule...which we need to get! Molecular information is stored in a module called Model. Let's use the IDLE window to find out what is stored in Model...
Based on this information (as well as the additional documentation on the Chimera web site), we can add the rest of the header information
For the next session, we'll be writing out the information about the atoms themselves. For the mol2 format, in addition to the information about the atoms found in the pdb file, we also need to figure out the atom types. As this is not actually critical for this exercise, I am providing a dictionary that will map from the Chimera atom type to the mol2 atom type:
Now we can write out the atom information. But...let's look a bit further into the Atom module using IDLE...
And now let's script it up!
And now let's print out the bond information...NOTE: Bond typing is not automatically performed in Chimera and can be very difficult..and it is outside the scope of this lecture. Therefore, we will be typing everything as a single bond.
And that will give us a minimal Mol2 file! Try opening the file up in Chimera and see what happens...
Exercises
1) All Roads Lead to Rome...AgainAs we have mentioned several times, there are multiple ways to get the same answer. In Session 3.2 (Loops and Escapes) exercise #3, you wrote a script that reads in a pdb file and calculates the average b-factor for the helical and beta sheet regions. Repeat the exercise using Chimera.
2) Tying It All Together
My algorithm writes out a table with a list of all of the peaks it finds. I have written a script that parses that table to summarize the number of alternate conformations that have been detected for the unbound open structure 1EXR and the unbound close structure 1PRW. Download
a) For the bottom image, use the graphical interface to create the image.
b) For the top image, write a script to create the image. As a first pass, don't worry about the gray colored balls because you don't actually have any information about the alternate conformations.
c) EXTRA CREDIT: The structure files I have given you were simplified to make them easier to use in the lecture. Download the original 1EXR structure from the Protein Databank and add the gray colored balls to your image.