Linux and scientific programming


2020/21

Dr. Rene Staritzbichler

Linux

What is Linux?

  • Open source operating system written mostly in 'C'
  • Derived from Unix (Unix: 1969, Linux: 1991)
  • Targeted to software development
  • Many highly combinable and powerful tools

The shell (bash)

  • Commandline terminal
  • Tools & scripting language
  • Tab expansion
  • Wildcards:
  • 
    $ ls *.pdb     # list all PDB files
    				      
  • Ctrl-c: kill current process
  • Copy: mark with mouse ,
    paste: click middle mouse button

Navigation

  • List content of directory: 'ls'
  • 
    					  $ ls /
    					  $ ls -lh         # '-l': detailed output, '-h': human readable, e.g. 4.3GB as file size
    				      
  • Change directory: 'cd'
  • 
    $ cd /home/YOU/course/data/
    				      
  • Current location: 'pwd'

Modifying

  • create a new file (many ways)
  • 
    					  $ touch new.txt            # create empty file
    					  $ echo "Hello" > new.txt   # creates file with single line               
    					  $ echo "World" >> new.txt  # append a line
    					  $ gedit new.txt            # opens text editor, file is created when saved
    				      
  • create a new directory: 'mkdir'
  • 
    					  $ mkdir NEW
    					  $ mkdir -p NEW/SUB/ONE_MORE/ANOTHER              
    				      
  • remove a file: 'rm'
  • remove a directory: 'rm -r'
  • 
    					  $ rm rubbish.txt
    					  $ rm -r dir/
    				      

View & modify files

  • Read and search: 'less FILE'
  • Write only parts:
    
    				      $ head -n 12 myfile.txt
    				      $ tail -n 33 myfile.txt | grep ATOM        # write only files containing the string 'ATOM'
    				  
  • Text editor: gedit
  • Programming & text: emacs
  • Many powerful IDEs for each programming language

Bash scripting

  • full programming language
  • all linux commands available

				      for f in *.pdb
				      do
				         ls -lh $f
				         grep ATOM $f > atoms_$f
				      done
				  
Nearly the same as one liner:

				      for f in *.pdb; do grep ATOM $f > atoms_$f; done
				  

Exercise 1/2

  • Login, open terminal
  • Determine current location
  • Go to directory course_MD/day_1/linux/
  • Check the content of that directory
  • Type 'sleep 10h' and 'Ctrl-C' it

Exercise 2/2

  • Create a new directory: 'backup'
  • Search in browser how to copy and rename data in Linux
  • Copy all PDB files to backup folder, rename to have ending .bak
  • Get rid of the backup
  • Filter atom lines from PDB files and pipe into new files
  • Open files in text editor. What do you see?

Moving stuff

  • Copy file or directories
  • 
    					cp PATH_OLD/FILE_OLD  PATH_NEW/FILE_NEW   # cp and rename
    					cp PATH_OLD/FILE_OLD  PATH_NEW/.          # cp to same name
    					cp -r DIRECTORY  PATH/.                   # cp recursively all content of directory                    
    				    
  • Move file or directory
  • 
    					mv PATH_OLD/OLD  PATH_NEW/NEW            # move and rename FILE OR DIRECTORY                 
    					mv PATH_OLD/OLD  PATH_NEW/.              # move to new location, keep name
    				    

Python

Running python

  • From commandline (Ctrl-d to exit):
  • 
    					  $ python
    					  >>> print( 2 * 3 + 0.3 ) 
    				      
  • From file:
  • 
    					  $ python script.py
    					  $ cat script.py
    					  print( 2 * 3 + 0.3 )
    				      

Libraries

  • blank python contains only general basic functionalities
  • all specific funcitonality needs to be loaded
  • 
    					  import os                         # e.g. for os.path.isdir
    					  import sys                        # e.g. for reading arguments 
    					  import numpy as np                # numerical python
    					  import matplotlib.pyplot as plt   # submodule for nice plots         
    				      

Data types 1/3

  • Variables: strings and numbers
  • 
    				      x = 2.0               # declared and value assigned 
    				      y = 3.0
    				      print( x + y )        # sum
    				      
    				      a = "hello "
    				      b = "world"
    				      print( a + b )        # concat
    				      print( a[1:5] )       # substring 
    				      
  • Variables need to be declared before they can be used or modified

Data types 2/3

  • Lists
  • 
    					  l = [ 1, 3.4, "alpha"]
    					  print( l )
    					  print( l[2] )
    					  l.append( 3.2 )
    					  print( l[-1] )
    					  print( l[0:2] )
    					  
    					  l = [0,"day", ["x","@" ] ]    
    					  print( l[2][1] )
    				      

Data types 3/3

  • Dictionary
  • 
    					  d = {}
    					  d["beer"] = 3.50
    					  d["hot dog"] = 2.99
    					  
    					  print( d )
    					  print("total: ", 2*d["beer"] + d["hot dog"] )   
    				      

Read and write

  • Write to console
  • 
    					print( "how extraordinary!", x )    
    				    
  • Read from file
  • 
    					with open( 'file.txt', 'r' ) as f:    
    					    for l in f:
    					        words = l.split()
    				    
  • Write to file
  • 
    					with open( 'file.txt', 'w' ) as w:   
    					    w.write( "x: " + str(x) + "\n")   
    					    print( x , file=w )
    				    

Grouping by identation

  • Code is grouped by identation
  • Same context has same (uninterrupted) identation
  • Identation: 'tab' key
  • 
    					  with open( 'file.txt', 'w' ) as w:    
    					      w.write( "x: " + str(x) + "\n")   
    					      print( x , file=w )
    				      

Identation and declaration

  • Variables are not known outside identation level:
  • 
    				      if True:
    					         a = "cheers"
    				      print( a)              # a is not known, ERROR	
    					
  • Declare it on the identation level you want to use it:
  • 
    					  a = ""				          
    					  if True:
    					      a = "cheers"
    					  print( a)          # a is known, has value "cheers"                                          
    					

Loops

  • for loop through list
  • 
    					  sum = 0
    					  for x in [ 1 , 2 , 3 ]:
    					       sum += x
    					  print( "total:", sum)
    				      
  • conditional while loop
  • 
    					  count = 0
    					  while count < 100:
    					      print( "trallalla" )
    					      count += 1
    				      

Conditions

  • if a string contains another one
  • 
    					  with open( 'protein.pdb' ) as f:
    					      for l in f:
    					          if "ATOM" in l:
    					              x = float( l[30:38] ) # convert string to number       
    					              y = float( l[38:46] )
    					              z = float( l[46:54] )
    				      
  • if a list contains a certain element
  • 
    					  types = [ 'CA', 'N', 'C' , 'O' ]  # protein backbone      
    					  with open( 'protein.pdb' ) as f:
    					      for l in f:
    					          atom_type = l[12:16].strip()
    					          if "ATOM" in l and atom_type in types:
    					              print( l )
    				      

Exercise

    Write script calculating the geometric center
  • open pdb file
  • iterate through file
  • extract atom positions
  • count atoms
  • sum elementwise (sum over x, sum over y, ..)
  • devide sum by number of atoms
  • print center to console
  • BUG-FIX: ATOM string may appear not at the begining of the line (use startswith() or [:])
  • extra: filter chain A (position 22 in ATOM lines)

Congratulations!



Your first script.

Tips

  • Use version tracking e.g. git for backup!
  • Keep it simple!
  • For most tasks you will find libraries
  • Search engines are great helpers when coding

Cheers :-)