This is the first example I learned in dynamic programming. The problem can be described simply. Now here are several time intervals named $\{1,2,\cdots,n\}$, with each $i$ has a starting tiem $s_i$, a finish time $f_i$, and a weight $w_i$. Our goal is to select a subset of $\{1,\cdots,n\}$ to maximize the sum of their weights while they do not overlap.
Suppose the interval sequence $\{1,2,\cdots,n\}$ has been sorted by their finish time in an increasing order. Define \[p(i):=\max \{j\in \{1,2,\cdots i-1\}: f_j<s_i\},\] i.e. the largest index $j$ that is on the left of $i$ and does not overlap $i$. Let $\mathcal{O}_i$ be the optimal subset we select from $\{1,2,\cdots,i\}$. $OPT(i)$ is the sum of weights of intervals in $\mathcal{O}_i$, i.e. \[OPT(i):=\sum_{k\in \mathcal{O}_i}w_k.\] Then we have a dichotomy: if $i\in \mathcal{O}_i$, then \[\mathcal{O}_i = \mathcal{O}_{p(i)}\cup \{i\};\] if $i\notin \mathcal{O}_i$, then \[\mathcal{O}_i = \mathcal{O}_{i-1}.\] Therefore we have the recursive equation for $OPT(i)$, \[OPT(i)=\max \{OPT(i-1), w_i+OPT(p(i))\}\tag{1}\]
Then we can follow eq(1) to select interval $i$. Indeed, as long as we calculate $OPT(i), \,i=1,2,\cdots n$, we can find the optimal solution directly. Here is the implementation of this idea in C++ code.
#include<iostream> #include<algorithm> using namespace std; //The goal of this code is to solve interval scheduling problem. There is n intervals with different value, we want to select several intervals with largest value sum. int n;//total number of intervals struct Intervals { double s;//start time double f;//finish time double val;//value bool if_selected = false; bool operator < (const Intervals& A) const { return f < A.f;//sort intervals by finish time } }Itv[10000]; double OptVal[10000];// OptVal[i] means the best value sum among 1~i void GetIntervals() { Itv[0].s = 0; Itv[0].f = 0; Itv[0].val = 0; cout << "Please input the total number of intervals: "; cin >> n; //input intervals cout << "Please input these "<< n <<" intervals:" <> Itv[i].s; cin >> Itv[i].f; cin >> Itv[i].val; } return; } int p(int j)//calculate the largest interval on the left of i that does not overlap j { if (j == 0) return 0; for (int i = j - 1; i >= 0; i --) if (Itv[i].f <= Itv[j].s) { return i; } } void SelectItv(int j) { for (int i = 1; i <= j; i ++) { double left,right; left = Itv[i].val + OptVal[p(i)]; right = OptVal[i - 1]; if (left > right) { OptVal[i] = left; Itv[i].if_selected = true;//choose interval i for (int k = i - 1; k > p(i); k --) Itv[k].if_selected = false; SelectItv(p(i));//Shrink the size of problem to 1~p(i) } else { OptVal[i] = right;// do not choose interval i } } return; } int main() { OptVal[0] = 0; GetIntervals(); sort(Itv+1, Itv+1+n); //sort in the structure [1,n) SelectItv(n); cout << "The selected intervals are: " << endl; for (int i = 1; i <= n; i ++ ) { if (Itv[i].if_selected) cout << "[" << Itv[i].s << "," << Itv[i].f << "]" << "value " << Itv[i].val<< endl; } cout <<"The optimal value sum is " << OptVal[n] << " ." << endl; return 0; }
Another famous example of dynamic programming is longest common subsequence (LCS) problem and sequence alignment problem. Let's articulate this problem first. Suppose we have two strings $A=a_1a_2\cdots a_m$ and $B=b_1b_2\cdots b_n$, where $a_i,b_i$ are characters from a fixed list, such as 26 letters in alphabet list. Sequence $S=s_1s_2\cdots s_l$ is called the subsequence of $A$ and $B$, if
Our goal is to find a subsequence $S$ of $A$ and $B$ with the longest length. In the optimal solution $S$, either $s_l=a_m=b_n$, or $a_m\neq b_n$. And if $a_m\neq b_n$, $a_m$ and $b_n$ cannot both appear in $S$, otherwise if there is $a_i(i< m)$ and $b_j(j< n)$ such that $a_m = b_j$ and $a_i = b_n$ are both in $S$, then it will contradict the second requirement in the definition of subsequence. In other word, if $a_m \neq b_n$, either $a_m$ or $b_n$ is not in $S$.
Define $OPT(i,j)$ be the length of $LCS$ of substring $A_i=a_1a_2\cdots a_i$ and $B_j=b_1b_2\cdots b_j$. Based on above analysis, we have the following recursive equation, \[OPT(i,j)=\begin{cases}OPT(i-1,j-1)+1&a_i=b_j\\ \max\{OPT(i-1,j),OPT(i,j-1)\}&a_i\neq b_j\end{cases}\tag{2}\] And define $OPT(i,0)=OPT(0,j)=0$.
Thus we can first use eq(2) to calculate all $OPT(i,j)$ from $OPT(1,1)$ to $OPT(m,n)$ recursively, and trace back the $OPT$ matrix to find the optimal solution. Below is the realization of the algorithm by Python.
def FindLCS(str1, str2): m = len(str1) n = len(str2) M = [[0 for x in range(n+1)] for x in range(m+1)] L = [[0 for x in range(n)] for x in range(m)] for i in range(m+1): M[i][0] = 0 for j in range(n+1): M[0][j] = 0 for i in range(m): #0,1,...,m-1 for j in range(n): #0,1,...,n-1 if str1[i] == str2[j]: # str1[0], str2[0] are corresponding to M[1][1] M[i+1][j+1]= M[i][j] + 1 L[i][j] = 'Hit' elif M[i+1][j] > M[i][j+1]: M[i+1][j+1] = M[i+1][j] L[i][j] = 'J-1' else: M[i+1][j+1] = M[i][j+1] L[i][j] = 'I-1' #output LCS = '' i = m -1 j = n -1 while i > -1 and j > -1 : if L[i][j] == 'Hit': LCS = str1[i] + LCS i = i-1 j = j-1 elif L[i][j] == 'I-1' : i = i - 1 else: j = j - 1 print('String 1 is:', str1) print('String 2 is:', str2) print('Their longest common subsequence is:', LCS) return #example A = 'GHJKINKK' B = 'GFHJIK' FindLCS(A,B)
[Output] String 1 is: GHJKINKK String 2 is: GFHJIK Their longest common subsequence is: GHJIK
Alignment problem is similar to LCS problem, but needs more steps to calculate mismatch and gap penalty. We can also define the score OPT(i,j) \[OPT(i,j) = \max \{\alpha_{ij} + OPT(i-1,j-1), OPT(i-1,j) +\delta, OPT(i,j-1)+\delta\}\] where $\alpha_{ij}$ is the similarity score between character $a_i$ and $b_j$, $\delta$ is the gap penalty. For the boundary, we can define $OPT(i,0)=i\delta$ and $OPT(0,j)=j\delta$. For example, we set $\alpha_{ij}=5$ if they are the same, while $\alpha_{ij}=-4$ if they are different, and set $\delta = -2$. Below is a realization of DNA alignment algorithm.
def DNA_align(str1, str2): m = len(str1) n = len(str2) GP = 2 #GAP PENALTY M = [[0 for x in range(n+1)] for x in range(m+1)] L = [[0 for x in range(n+1)] for x in range(m+1)] for i in range(m+1): M[i][0] = -i*GP L[i][0] = 'I-1' for j in range(n+1): M[0][j] = -j*GP L[0][j] = 'J-1' for i in range(m): #0,1,...,m-1 for j in range(n): #0,1,...,n-1 if str1[i] == str2[j]: val1 = M[i][j] + 5 elif str1[i] != str2[j]: val1 = M[i][j] - 4 val2 = M[i+1][j] - GP val3 = M[i][j+1] - GP val_max = max(val1,val2,val3) M[i+1][j+1] = val_max if val1 == val_max: if str1[i] == str2[j]: L[i+1][j+1] = 'Hit' elif str1[i] != str2[j]: L[i+1][j+1] = 'Unmatched' elif val2 == val_max: L[i+1][j+1] = 'J-1' else: L[i+1][j+1] = 'I-1' LCS = '' S1 ='' S2 ='' i = m j = n while i > -1 and j > -1: if i == 0 and j == 0: break if L[i][j] == 'Hit': LCS = '|' + LCS S1 = str1[i-1] + S1 S2 = str2[j-1] + S2 i = i-1 j = j-1 elif L[i][j] == 'Unmatched': LCS = ' ' + LCS S1 = str1[i-1] + S1 S2 = str2[j-1] + S2 i = i-1 j = j-1 elif L[i][j] == 'I-1' : LCS = ' ' + LCS S1 = str1[i-1] + S1 S2 = '-' + S2 i = i - 1 else: LCS = ' ' + LCS S1 = '-' + S1 S2 = str2[j-1] + S2 j = j - 1 print('string 1:',S1) print(' ',LCS) print('string 2:',S2) return #run an example A = 'CGCGTTATTTCGAAACGCGCGCGCGC' B = 'CGTAGGCTCGCAAAAAGCGCGCGCGCG' DNA_align(A,B)
[Output] string 1: CGCGTTA-TTTCG--AAACGCGCGCGCGC- || || ||| ||| |||||||||| string 2: --CG-TAGGCTCGCAAAAAGCGCGCGCGCG