This article presents an algorithm to compute set union and set minus in a specific context. Each set is a set of contiguous stretches on a real line.

### Problem Description

#### Union

Consider a scenario where two people are teamed up to contribute to a particular task, say attending to an infant child. As long as any one of them is available to look after the child, it's fine. For other times, they probably would have to look for an alternative arrangement. The time any one person is available can be represented as a set of durations. The time for which at least one of the two persons is available is nothing but the union of the sets of durations during which each one of them is available.

#### Set Minus

Consider a scenario when two people are looking for a slot wherein both are available for a discussion. This can be thought of a difference between the the available time of one person and the busy time of the other.

### Solution Approach

The solution approach makes use of a variable

*level*which keeps track of the starts and ends of durations. After placing all the important time instants on a time line, we scan the time line from left to right.
While computing union, we use the following strategy to set the

*level*during our scan of the timeline:- Each time any of the duration begins, we increment
*level*by 1. - Each time any of the durations ends, we decrement
*level*by 1.

As per the above, the stretches of the timeline with

*level*= 0 indicate stretches when none of the two sets is present.
The rest of the timeline will have positive values of

*level*indicating that at least one of the two sets is present there.
In another pass of the timeline, we now collect all the continuous stretches of the timeline where

*level*is not zero. This gives us the set of durations corresponding to the union of the two sets.
While computing set minus, we employ the following strategy for maintaining

*level*.- The starting point of a duration in the first set and end of a duration in the second set cause
*level*to be incremented by 1. - The end point of a duration in the first set and start of a duration in the second set cause
*level*to be decremented by 1.

As per the above, the stretches of the timeline with

*level = 1*indicates stretches in the difference between the first set and the second.
In another pass, we collect the set of contiguous portions of the timeline where

*level = 1*. This is our answer of set minus.### Extensions

The

**union**and**setminus**functions can be used to implement other useful set operations on the above kind of sets,*viz*.**complement**and**intersection**.### Implementation

**Representation of the sets**

We represent the sets of durations as lists of pairs. s1 and s2 in the above figure (Timeline) have been represented as follows:

s1 = [(1, 5), (8, 10), (15, 20)] s2 = [(2, 4), (6, 12), (16, 21)]

#### Code Design

When we implement the above two algorithms as two completely separate functions**union**and

**setminus**, we realise that they have a very common structure. This gives us the motivation to try and capture this commonality between the two functions in a piece of re-usable code. We do so by designing a higher-order function

**make_function**, which is parameterised by another function which captures the specific elements of the two functions.

**f1**and

**f2**are the two functions capturing these specificities of union and set minus.

The

**make_function**function prepares a function and returns it as a result. the two functions**union**and**setminus**are created by calling**make_function**with**f1**and**f2**as arguments respectively. Note that this could have been implemented alternatively by embedding a call to**make_function**in**union**and**setminus**. This is a perfectly good option when**union**and setminus get used only once. However, considering a scenario where**union**and setminus may get called several times in a larger program, the strategy used here will lead to a slightly improved speed, not to mention improved readability (arguable).### Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | # utility function for the implementation of union def f1(s1, s2): START = 1 END = -1 l = [] for (s, e) in s1: l.append((s, START)) l.append((e, END)) for (s, e) in s2: l.append((s, START)) l.append((e, END)) return sorted(l, key = lambda (x, y): x) #utility function for implementation of setminus def f2(s1, s2): START1 = 1 END1 = -1 START2 = -1 END2 = 1 l = [] for (s, e) in s1: l.append((s, START1)) l.append((e, END1)) for (s, e) in s2: l.append((s, START2)) l.append((e, END2)) return sorted(l, key = lambda (x, y): x) # Higher order function to create union and setminus functions def make_function(f): # boilerplate function to be returned from make_function after being plugged with f. def g(s1, s2): level = 0 result = [] l = f(s1, s2) for (t, SE) in l: print (t, level) last = level level += SE if level == 1 and last == 0: start = t if level == 0 and last == 1: end = t result.append((start, end)) return result return g union = make_function(f1) setminus = make_function(f2) def t1(): s1 = [(1, 5), (8, 10), (15, 20)] s2 = [(2, 4), (6, 12), (16, 21)] print union(s1, s2) def t2(): s1 = [(1, 5), (8, 10), (15, 20)] s2 = [(2, 4), (6, 12), (16, 21)] print setminus(s1, s2) if __name__ == "__main__": t1() t2() |