Up to this point, we have learned how to organized data. In this class, we'll learn how to DO stuff with it. In all programming languages, there are statements that allow you to do the same thing over and over--in a LOOP--with a short amount of code. The loop continues until the statement at the start of the loop is no longer true. Python offers two major flavors of loop, the for loop and the while loop.
We have already met the for loop in disguise. In a previous lesson, we called it the iterator operator for lists.
#!/usr/bin/python
li =['Why','do','superheroes','wear','tights?']#iterate over listfor x in li: print x
giving: $./loops.py
Why
do
superheroes
where
tights?
Let's look at this a bit more closely and figure out what is happening.
Used in this manner, for goes through each item in the list that follows "in" and assigns the value it finds to the variable before "in" in succession.
...But, what if we want to do something even more interesting in the loop. How do we add multiple commands?
Formatting loops
#iterate over listfor x in li:
#check whether item equals 'do'if x =='do':
print'Found:', x
else:
print x
giving: $./loops.py
Why
Found: do
superheroes
where
tights?
Just like in conditional statements from this morning, the 'body' of the loop is indented. Reversion to the original indentation indicates the end of the loop. NOTE: Even though we have expanded the loop to multiple lines, the colon is still after the first line!
We can also loop over fancy, nested data structures using fancy, nested loops
Looping over fancy data structures
liLi =[[1,2,3],[7,8,9]]for x in liLi:
print x
for y in x:
print y
print
giving: $./loops.py
[1, 2, 3]
1
2
3
[7, 8, 9]
7
8
9
What is happening here?
Mutating lists with loops
When we discussed loops, we said that they were mutable. We demonstrated that by using the sort() operator. Now, we would like to change each item in the list.
li2 =[1,22,48,36,101]print li2
for x in li2:
x = x + 23print li2
As I mentioned before, when we are looping over the list, we are saving (and doing stuff to) a COPY of the item in the list. If you try to do something to the copy, that modification is not translated back to the original list. In order to access the item itself, you need to use the command enumerate, which returns a tuple with the index of the item as well as the item itself. You can use the index to access the actual item in the list, not just the copy. However, you can access the copy, too, if you want.
print li2
for xInd, x inenumerate(li2):
print x
li2[xInd]= li2[xInd] + 23print li2
You also may have the need to loop over multiple lists at the same time. You can do this using the zip command, which returns a tuple containing the item in each list.
for x,y,z inzip(li,li2,li):
print x,y,z
giving: $./loops.py
Why 24 Why
do 45 do
superheroes 71 superheroes
wear 59 wear
tights? 124 tights?
What happens when one list is of a different length?
li3 =[3]for x,y inzip(li,li3):
print x,y
giving: $./loops.py
Why 3
While Loops
The while loop is a more generic form of the for loop. It continues until the statement in the first line is no longer true.
x ='Ni!'while x:
print x
x = x[1:]print
x =5while x >0:
print x
x = x - 1print
giving: $./loops.py
Ni!
i!
!
5
4
3
2
1
Notice that the format of the while loop is the same as the for loop (eg truth statement ending in : and indented body). Also, notice that you have to EXPLICITLY modify the variable within the loop that is being checked in the the truth statement.
What happens if you don't change that variable?
x =5while x >0:
print'Hit Ctrl+C to quit'
giving: $./loops.py
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
...
Escaping Loops
Occasionally, you might want to get out of a loop before the truth statement is met. You can do this using break, continue, or pass.
break: jumps out of the closest loop
continue: jumps to the top of the closest loop
pass: empty placeholder
x =10while x:
x = x-1#use mod to check if number#is even? go to next numberif x % 2==0:
continue#use comma to print multiple#things on same lineelse:
print x,
giving: $./loops.py
9 7 5 3 1
In this example, each number is checked by the if statement to see if it is odd. If the number is even, the loop goes back to the while truth statement and continues.
y =10
x = y-1while x >1:
if y % x ==0:
print x,'is a factor of', y
break
x = x-1else:
print y,'is prime'
giving: $./loops.py
5 is a factor of 10
Here, each number is checked to see if it is a factor of the y variable. If it is a factor, the number is printed and the code stops. If no factor is found, the loop is defers to the else statement. What happens when you change the y to 100? 15? 3?
Note that else, break, continue, and pass can be used in the context of ANY loop, not just these examples.
Any questions?
EXERCISES
1) Getting comfortable with loops (adapted from Learning Python)
a) Write a for loop that prints the ASCII code of each character in the string 'Rock and Roll'. HINTS: You can loop over a string the same way you loop over a list. Also, use the built-in function ord(character) to convert each character into an ASCII integer.
b) Next, change your loop to compute the sum of the ASCII codes of all characters in the string.
c) Modify your code to print a new list that contains the ASCII codes of the characters in the string.
Solution:
rnr ="Rock and Roll"sum=0
c_list =[]for c in rnr:
sum +=ord(c)
c_list.append(ord(c))printsumprint c_list
2) All Roads Lead to Rome (adapted from Learning Python)
A coworker (who obviously is not a native Python speaker) hands you the following code:
#!/usr/bin/python
L =[1,2,4,8,16,32,64]
x =5
found = i =0whilenot found and i <len(L):
#check if 2 to the power#of x is in the listif2 ** x == L[i]:
found =1else:
i = i+1if found:
print'at index', i
else:
print x,'not found'
giving: $./powers.py
at index 5
As is, the script does not follow normal Python coding techniques. Follow the steps below to improve it.
a) Rewrite this code with a while/else loop to eliminate the found flag and the final if statement.
b) Rewrite the example to use a for/else loop to eliminate the explicit list indexing logic.
c) Remove the loop completely by rewriting the examples with a simple in operator membership expression (HINT: try the line 'print 2 in [1,2,3]')
d) Use a for loop and the list append method to generate the list L instead of typing it by hand.
Solution:
a)
#!/usr/bin/python
L =[1,2,4,8,16,32,64]
x =5
i =0while i <len(L):
#check if 2 to the power#of x is in the listif2 ** x == L[i]:
print'at index', i
breakelse:
i = i+1else:
print x,'not found'
b)
#!/usr/bin/python
L =[1,2,4,8,16,32,64]
x =5for i, num inenumerate(L):
#check if 2 to the power#of x is in the listif2 ** x == num:
print'at index', i
breakelse:
print x,'not found'
c)
#!/usr/bin/python
L =[1,2,4,8,16,32,64]
x =5#check if 2 to the power#of x is in the listif2 ** x in L:
print'at index', L.index(2 ** x)else:
print x,'not found'
d)
#!/usr/bin/python
L =[]for y inrange(7):
L.append(2 ** y)print L
x =5#check if 2 to the power#of x is in the listif2 ** x in L:
print'at index', L.index(2 ** x)else:
print x,'not found'
3) Doing something interesting
A friend of yours in another lab is starting a new project on neuraminidase from the H1N1 flu virus (swine flu). She explains to you that, once a new virus is formed, neuraminidase clips off polysaccharide chains on the surface of the infected cell, ensuring that the virus doesn't get stuck as it is leaving. At the moment, she is interested in comparing her new structure of neuraminidase to the existing structure. Could you write a script that will tell her what percent of the structure is helical, beta sheet, or some other structure?
Here are some things to help you out:
a) Download the structure of H1N1 neuraminidase (PDB code 3b7e) as an example structure
b) I am supplying you with a script that will open the pdb file, parse out the information about the sequence and secondary structure, and save three lists called full_sequence, helix_aa and sheet_aa. Full_sequence is a list containing the full sequence of the protein (NOTE: The protein crystallized as a homodimer), respectively. Helix_aa and sheet_aa are lists of secondary structure descriptions, which have the following formats when converted into lists (paraphrased from PDB file format documenation)
helix_aa
0. Record name 'HELIX'
1. Serial number of helix
2. Helix identifier
3. Name of initial residue
4. Chain identifier
5. Sequence number of initial residue
6. Name of terminal residue
7. Chain identifier
8. Sequence number of terminal residue
sheet_aa
0. Record name 'SHEET'
1. Strand number
2. Sheet identifier
3. Number of strands in sheet
4. Residue name of initial residue
5. Chain identifier of initial residue
6. Sequence number of initial residue
7. Residue name of terminal residue
8. Chain identifier of terminal residue
9. Sequence number of terminal residue
##SCRIPT TO PARSE OUT SECONDARY STRUCTURE INFORMATION#!/usr/bin/pythonimportsys,os
full_seq =[]
helix_aa =[]
sheet_aa =[]
f1 =open('3B7E.pdb','r')for next in f1:
tmp = next.strip().split()if tmp[0]=='SEQRES':
if tmp[2]=='A':
full_seq.extend(tmp[4:])elif tmp[0]=='HELIX':
try:
int(tmp[5])except:
tmp[5]= tmp[5][:-1]
helix_aa.append(tmp[:9])elif tmp[0]=='SHEET':
sheet_aa.append(tmp[:10])
ANSWER:
Total number of residues = 385
Percent helical = 3.8961038961
Percent B sheet = 45.1948051948
Percent other = 50.9090909091
Solution:
#!/usr/bin/pythonimportsys,os
full_seq =[]
helix_aa =[]
sheet_aa =[]
f1 =open('3B7E.pdb','r')for next in f1:
tmp = next.strip().split()if tmp[0]=='SEQRES':
if tmp[2]=='A':
full_seq.extend(tmp[4:])elif tmp[0]=='HELIX':
try:
int(tmp[5])except:
tmp[5]= tmp[5][:-1]
helix_aa.append(tmp[:9])elif tmp[0]=='SHEET':
sheet_aa.append(tmp[:10])
num_helix_aa =0for helix_inst in helix_aa:
if helix_inst[4]=='A'and helix_inst[7]=='A':
num_helix_aa +=(float(helix_inst[8]) - float(helix_inst[5]) + 1)
num_sheet_aa =0for sheet_inst in sheet_aa:
if sheet_inst[5]=='A'and sheet_inst[8]=='A':
num_sheet_aa +=(float(sheet_inst[9]) - float(sheet_inst[6]) + 1)
seq_len =len(full_seq)print sheet_aa
print"Total number of residues=%d" % seq_len
print"Percent helical=%f" % ((num_helix_aa/seq_len) * 100)print"Percent B sheet=%f" % ((num_sheet_aa/seq_len) * 100)print"Percent other=%f" % (((seq_len - num_helix_aa - num_sheet_aa)/seq_len) * 100)
BONUS: What is the average b-factor (measure of the amount of vibrational motion each atom is undergoing) for each region?
Here is a modification of the script above that collects the information about each atom and stores them in the list atoms. Each atom has the following format when converted into list (paraphrased from PDB file format documenation)
atoms
0. Record name ATOM
1. Atom sequence number
2. Atom name
3. Residue name
4. Chain identifier
5. Residue sequence number
6-8. X, Y, Z coordinates
9. Occupancy in structure (1.0 = 100% occupied, 0.5 = 50% occupied)
10. B-factor
11. Element
##SCRIPT TO PARSE OUT SECONDARY STRUCTURE AND ATOM INFORMATION#!/usr/bin/python
full_seq =[]
helix_aa =[]
sheet_aa =[]
atoms =[]
f1 =open('3B7E.pdb','r')for next in f1:
tmp = next.strip().split()if tmp[0]=='SEQRES':
if tmp[2]=='A':
full_seq.extend(tmp[4:])elif tmp[0]=='HELIX':
try:
int(tmp[5])except:
tmp[5]= tmp[5][:-1]
helix_aa.append(tmp[:9])elif tmp[0]=='SHEET':
sheet_aa.append(tmp[:10])elif tmp[0]=='ATOM':
iflen(tmp)<12:
begin = tmp[0:2]
end = tmp[3:]
middle =[tmp[2][:3], tmp[2][4:]]
tmp = begin + middle + end
try:
int(tmp[5])except:
continue
atoms.append(tmp)
ANSWER:
Structure Chain A Chain B
Helix 10.7259090909 10.4519090909
B Sheet 9.80990572879 9.76974619289
Other 11.9676417704 11.8491953232
Solution--version I:
Loops and Escapes
For Loops
Up to this point, we have learned how to organized data. In this class, we'll learn how to DO stuff with it. In all programming languages, there are statements that allow you to do the same thing over and over--in a LOOP--with a short amount of code. The loop continues until the statement at the start of the loop is no longer true. Python offers two major flavors of loop, the for loop and the while loop.We have already met the for loop in disguise. In a previous lesson, we called it the iterator operator for lists.
giving:
$./loops.py
Why
do
superheroes
where
tights?
Let's look at this a bit more closely and figure out what is happening.
Used in this manner, for goes through each item in the list that follows "in" and assigns the value it finds to the variable before "in" in succession.
...But, what if we want to do something even more interesting in the loop. How do we add multiple commands?
Formatting loops
giving:$./loops.py
Why
Found: do
superheroes
where
tights?
Just like in conditional statements from this morning, the 'body' of the loop is indented. Reversion to the original indentation indicates the end of the loop. NOTE: Even though we have expanded the loop to multiple lines, the colon is still after the first line!
Looping over lists a set number of times
giving$./loops.py
hello
hello
hello
hello
[0, 1, 2, 3]
We can also loop over fancy, nested data structures using fancy, nested loops
Looping over fancy data structures
giving:$./loops.py
[1, 2, 3]
1
2
3
[7, 8, 9]
7
8
9
What is happening here?
Mutating lists with loops
When we discussed loops, we said that they were mutable. We demonstrated that by using the sort() operator. Now, we would like to change each item in the list.giving:
$./loops.py
[1, 22, 48, 36, 101]
[1, 22, 48, 36, 101]
What the heck??? Why didn't it change?
As I mentioned before, when we are looping over the list, we are saving (and doing stuff to) a COPY of the item in the list. If you try to do something to the copy, that modification is not translated back to the original list. In order to access the item itself, you need to use the command enumerate, which returns a tuple with the index of the item as well as the item itself. You can use the index to access the actual item in the list, not just the copy. However, you can access the copy, too, if you want.
giving:
$./loops.py
[1, 22, 48, 36, 101]
1
22
48
36
101
[24, 45, 71, 59, 124]
Looping over multiple lists at the same time
You also may have the need to loop over multiple lists at the same time. You can do this using the zip command, which returns a tuple containing the item in each list.giving:
$./loops.py
Why 24 Why
do 45 do
superheroes 71 superheroes
wear 59 wear
tights? 124 tights?
What happens when one list is of a different length?
giving:
$./loops.py
Why 3
While Loops
The while loop is a more generic form of the for loop. It continues until the statement in the first line is no longer true.giving:
$./loops.py
Ni!
i!
!
5
4
3
2
1
Notice that the format of the while loop is the same as the for loop (eg truth statement ending in : and indented body). Also, notice that you have to EXPLICITLY modify the variable within the loop that is being checked in the the truth statement.
What happens if you don't change that variable?
giving:
$./loops.py
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
...
Escaping Loops
Occasionally, you might want to get out of a loop before the truth statement is met. You can do this using break, continue, or pass.break: jumps out of the closest loop
continue: jumps to the top of the closest loop
pass: empty placeholder
giving:
$./loops.py
9 7 5 3 1
In this example, each number is checked by the if statement to see if it is odd. If the number is even, the loop goes back to the while truth statement and continues.
giving:
$./loops.py
5 is a factor of 10
Here, each number is checked to see if it is a factor of the y variable. If it is a factor, the number is printed and the code stops. If no factor is found, the loop is defers to the else statement. What happens when you change the y to 100? 15? 3?
Note that else, break, continue, and pass can be used in the context of ANY loop, not just these examples.
Any questions?
EXERCISES
1) Getting comfortable with loops (adapted from Learning Python)a) Write a for loop that prints the ASCII code of each character in the string 'Rock and Roll'. HINTS: You can loop over a string the same way you loop over a list. Also, use the built-in function ord(character) to convert each character into an ASCII integer.
b) Next, change your loop to compute the sum of the ASCII codes of all characters in the string.
c) Modify your code to print a new list that contains the ASCII codes of the characters in the string.
Solution:
2) All Roads Lead to Rome (adapted from Learning Python)
A coworker (who obviously is not a native Python speaker) hands you the following code:
giving:
$./powers.py
at index 5
As is, the script does not follow normal Python coding techniques. Follow the steps below to improve it.
a) Rewrite this code with a while/else loop to eliminate the found flag and the final if statement.
b) Rewrite the example to use a for/else loop to eliminate the explicit list indexing logic.
c) Remove the loop completely by rewriting the examples with a simple in operator membership expression (HINT: try the line 'print 2 in [1,2,3]')
d) Use a for loop and the list append method to generate the list L instead of typing it by hand.
Solution:
a)
b)
c)
d)
3) Doing something interesting
A friend of yours in another lab is starting a new project on neuraminidase from the H1N1 flu virus (swine flu). She explains to you that, once a new virus is formed, neuraminidase clips off polysaccharide chains on the surface of the infected cell, ensuring that the virus doesn't get stuck as it is leaving. At the moment, she is interested in comparing her new structure of neuraminidase to the existing structure. Could you write a script that will tell her what percent of the structure is helical, beta sheet, or some other structure?
Here are some things to help you out:
a) Download the structure of H1N1 neuraminidase (PDB code 3b7e) as an example structure
b) I am supplying you with a script that will open the pdb file, parse out the information about the sequence and secondary structure, and save three lists called full_sequence, helix_aa and sheet_aa. Full_sequence is a list containing the full sequence of the protein (NOTE: The protein crystallized as a homodimer), respectively. Helix_aa and sheet_aa are lists of secondary structure descriptions, which have the following formats when converted into lists (paraphrased from PDB file format documenation)
helix_aa
0. Record name 'HELIX'
1. Serial number of helix
2. Helix identifier
3. Name of initial residue
4. Chain identifier
5. Sequence number of initial residue
6. Name of terminal residue
7. Chain identifier
8. Sequence number of terminal residue
sheet_aa
0. Record name 'SHEET'
1. Strand number
2. Sheet identifier
3. Number of strands in sheet
4. Residue name of initial residue
5. Chain identifier of initial residue
6. Sequence number of initial residue
7. Residue name of terminal residue
8. Chain identifier of terminal residue
9. Sequence number of terminal residue
ANSWER:
Total number of residues = 385
Percent helical = 3.8961038961
Percent B sheet = 45.1948051948
Percent other = 50.9090909091
Solution:
BONUS: What is the average b-factor (measure of the amount of vibrational motion each atom is undergoing) for each region?
Here is a modification of the script above that collects the information about each atom and stores them in the list atoms. Each atom has the following format when converted into list (paraphrased from PDB file format documenation)
atoms
0. Record name ATOM
1. Atom sequence number
2. Atom name
3. Residue name
4. Chain identifier
5. Residue sequence number
6-8. X, Y, Z coordinates
9. Occupancy in structure (1.0 = 100% occupied, 0.5 = 50% occupied)
10. B-factor
11. Element
ANSWER:
Structure Chain A Chain B
Helix 10.7259090909 10.4519090909
B Sheet 9.80990572879 9.76974619289
Other 11.9676417704 11.8491953232
Solution--version I:
Solution--version2: