Bits of Learning

Learning sometimes happens in big jumps, but mostly in little tiny steps. I share my baby steps of learning here, mostly on topics around programming, programming languages, software engineering, and computing in general. But occasionally, even on other disciplines of engineering or even science. I mostly learn through examples and doing. And this place is a logbook of my experiences in learning something. You may find several things interesting here: little cute snippets of (hopefully useful) code, a bit of backing theory, and a lot of gyan on how learning can be so much fun.

Friday, March 29, 2013

Counter in Python

Presenting here two implementations of a counter.

What does the counter do?

Given a base n, this counter will allow you to do base n counting. What's base-n counting? There are n digits: 0, 1, 2, ..., n - 1. At each place, the digit changes from i to i + 1 if i < n - 1, and to 0 if i = n - 1. This change happens when the change at that place is triggered. And when does the trigger happen. For the unit's place, the trigger happens whenever the function is called. For all other places, the trigger happens when the previous place's digit rolls over from n - 1 to 0. That's all. That's a bit technical description for the simple act of counting we learn in our KG. But if we want to implement the counter, it needs to be written in this algorithmic language.

A 3-bit binary (base-2) counter goes like:
000, 001, 010, 011, 100, 101, 110, 111, 000, ...

A 2-digit base-4 counter would count as:
00, 01, 02, 03, 10, 11, 12, 13, 20, 21, 22, 23, 30, 31, 32, 33, 00, ...


What do we usually do in day-to-day counting? decimal (base-10) counting.


How do we Use the Counter Program?


Below is a piece of code that could be used to implement a binary counter on a 3-bit number.


base = 2
inc = makeCounter_iter(base)
n = [0, 0, 0]
print n
for i in range(base ** len(n)):
    n = inc(n)
    print n

 The inc(n) gives n + 1 in the base n number system which is pre-decided while instantiating the counter:
base = 2

#!/usr/bin/python
def makeCounter_iter(base):
    def isZero(lst): return lst == [0] * len(lst)

    def inc(num):
        if(len(num) > 1):
            rem = inc(num[:-1])
            if(isZero(rem)):
                if(num[-1] == base - 1):    rem.extend([0]) 
                else:                rem.extend([num[-1] + 1])
            else:    rem.extend([num[-1]])
            return rem
        else:
            new = [0]
            if(not num[0] == base - 1):    new[0] = num[0] + 1
            return new
    return inc

Now follows the recursive version.

#!/usr/bin/python
def makeCounter_rec(base):
    def incDigit(num, pos):
        new = num[:]
        if(num[pos] == base - 1):
            new[pos] = 0
            if(pos < len(num) - 1):    return incDigit(new, pos + 1)
        else:
            new[pos] = new[pos] + 1
        return new

    def inc(num): return incDigit(num, 0)
    return inc

Any day, the recursive implementation trumps over the iterative version. The recursive implementation is smaller and more readable because it's very like the way the process of counting is formally defined in the beginning of the article.

Functional Programming Flavour

One thing to notice is that we use functional language aspect of python. The function makeCounter (makeCounter_iter and makeCounter_rec) is a higher order function that returns another function inc, the increment function. Also, the value of base is fixed in the creation context of inc inside the makeCounter function. The inc function uses the value of base at a place in the code which is typically outside the lexical scope where base is defined. This is known as the closure feature of functional programming. In a typical object-oriented scenario, this effect is achieved using the builder design pattern.

Wednesday, March 20, 2013

Backup Script in Python

Sometime back I put out a backup script written in bash shell script. After having used it successfully for a while, I decided to re-write it in Python. Several reasons:
It had bugs. The bugs were known (it's not as if it's a million lines of code). But I couldn't get rid of them. Simply because I could never understand the semantics of the language well enough to write what I wanted to, exactly.
On the other hand, Python has this elegant, readable, syntax. The things do as you would except them to. Compilation is not minimal as in the case of shell. So, it throws up errors when there are any, and you can catch them. Consequently, the python code that I came up with is smaller and more readable.
My initial experience in using it is that it seems to run quicker too. But I can't vouch for that.

I managed the re-implementation in a couple of hours, as opposed to probably several days worth of effort (with gaps) that went into the shell-script. Part of the reason was of-course that the design was absolutely clear in my mind as I had already done the implementation earlier. So, I don't repent having spent my effort building my first prototype and throwing it away. But another part is (please don't read this sentence if you are a shell-script fan): Python is more modern and better than shell-script.

Here are the main features:

  1. It will make the contents under sourceRoot identical to destinationRoot.
  2. It will do selective copying. Means, it will copy a file anywhere inside sourceRoot to the right place inside destinationRoot if and only if there's no corresponding copy in destinationRoot, or destination copy is found to be older than the source copy.
  3. It allows you to specify any subdirectory of sourceRoot that you want to backup. This is useful when you know that changes since the last backup are localised to a particular location, and you don't want to waste time scanning other locations. For large data, this saves a lot of time.


So, I invite you to prefer this new one to the old one. It does everything that the shell-script did. And it doesn't have the old bugs. So here goes the code:


#!/usr/bin/python

import os
import sys
import shutil

from os import *
from string import *

sourceRoot = ""; # fill in appropriate value
destinationRoot = ""; # fill in appropriate value
backupDirectoryName = ""; # fill in appropriate value

def isSubdirectory(dir1, dir2):
 if(len(dir1) < len(dir2)):
  return False
 if(not dir1[:len(dir2)] == dir2):
  return False
 return True

def getRelativeName(dirName, root):
 if(not isSubdirectory(dirName, root)):
  raise Exception(dirName + ' is not a subdirectory of ' + root)
 if(len(dirName) < len(root)):
  return dirName
 if(root[-1] == '/'):
  return dirName[len(root):]
 else:
  return dirName[len(root) + 1:]

# always provide the full path name
def backupDir(dirName):
 global sourceRoot

# print 'going into directory: ' + dirName
 relativePathName = getRelativeName(dirName, sourceRoot)
 currentDestinationDir = destinationRoot + '/' + relativePathName
 if(not os.path.exists(currentDestinationDir)):
  os.mkdir(currentDestinationDir)
 allnames = listdir(dirName)
 for name in allnames:
  sourceName = dirName +'/' + name
  if(os.path.isfile(sourceName)):
   destinationFileName = currentDestinationDir + '/' + name
   if((not os.path.exists(destinationFileName)) or (os.path.getmtime(sourceName) > os.path.getmtime(destinationFileName))):
    print 'copying ' + sourceName + ' to ' + currentDestinationDir + ' ...'
    shutil.copyfile(sourceName, destinationFileName)
#   else:
#    print 'not copying ' + sourceName + ' to ' + currentDestinationDir + ' ...'
  elif(os.path.isdir(sourceName)):
   backupDir(sourceName)
  else:
   print 'Something wrong with ' + name
# print 'going out of directory: ' + dirName

# path is relative to root, not an absolute path
def createPath(path, root):
 def getPathName(path):
  if(len(path) > 1):
   return path[0] + '/' + getPathName(path[1:])
  return path[0]

 if(not os.path.exists(root + '/' + getPathName(path))):
  if(len(path) > 1):
   createPath(path[:-1], root)
  os.mkdir(root + '/' + getPathName(path))

# given a full/absolute pathname dirName, this returns the path relative to root.
def getRelativePath(dirName, root):
 if(not isSubdirectory(dirName, root)):
  raise Exception(dirName + ' is not a subdirectory of ' + root)
 return split(getRelativeName(dirName, root), '/')

def backup(s, d):
 global backupDirectoryName
 global sourceRoot
 global destinationRoot

 sourceRoot = s
 destinationRoot = d

 if(len(sys.argv) != 2):
  print 'usage: backup.py dirname'
  sys.exit()

 backupDirectoryName = sys.argv[1]
 if(sourceRoot == ''):
  print "sourceRoot hasn't been set"
  sys.exit()
 if(destinationRoot == ''):
  print "destinationRoot hasn't been set"
  sys.exit()
 if(backupDirectoryName == ''):
  print "backupDirectoryName hasn't been set"
  sys.exit()
 if(not os.path.isdir(sourceRoot)):
  print "sourceRoot = " + sourceRoot + " doesn't exist."
  sys.exit()
 if(not os.path.isdir(destinationRoot)):
  print "destinationRoot = " + destinationRoot + " doesn't exist."
  sys.exit()
 if(not os.path.isdir(backupDirectoryName)):
  print "backupDirectory = " + backupDirectoryName + " doesn't exist."
  sys.exit()
 if(not isSubdirectory(backupDirectoryName, sourceRoot)):
  print "backupDirectoryName isn't a subdirectory of the sourceRoot."
  sys.exit()
 createPath(getRelativePath(backupDirectoryName, sourceRoot), destinationRoot)
 backupDir(backupDirectoryName)

The above code is a sort of library, which you use along with a driver script. An example follows:

#!/usr/bin/python

from backup import *

sourceRoot = '/home/sujitkc/my_work'
destinationRoot = '/home/sujitkc/work_backup'

backup(sourceRoot, destinationRoot)

We usually have multiple backup scenarios. For example, I have the following scenarios:

  1. I take daily (approximately) backup of my work folder at office.
  2. I take fortnightly or monthly backup of my personal data at home.
  3. I take monthly backup of my entire data.

In the above three cases, the only thing that changes is the sourceRoot and destinationRoot. Therefore, I have a separate driver for each of the above scenarios: backup_work.pybackup_personal.py and backup_all.py respectively. In each of these drivers, I have set the sourceRoot and destinationRoot variables to a different appropriate value. Depending on my backup scenario, I just have to run the corresponding driver script. That's all!
  
All you need to do to use the script are the following:
  • Save script 1 as backup.py somewhere and make it executable using:
chmod +x backup.py
  • Save script 2 as, say, backup_work.py, at the same location and make it executable using:
chmod +x backup_work.py
  • In script 2, change the values of the sourceRoot and destinationRoot variables to appropriate locations.
  • You are all set! Now run the script as follows:
./backup_work.py your_backup_directory
If you are just starting off with this script and are wondering what to put as your_backup_directory, here's a clue: often, the your_backup_directory parameter has the same value as sourceRoot.


Request

If you find the above script useful, it will be highly appreciated if you drop me a word of acknowledgement as a comment to this post. Feel free to communicate if you find any problem with the code.

Related Post:


A Script for Backing Up