Dynamic Programming

1. Weighted Interval Scheduling

This is the first example I learned in dynamic programming. The problem can be described simply. Now here are several time intervals named $\{1,2,\cdots,n\}$, with each $i$ has a starting tiem $s_i$, a finish time $f_i$, and a weight $w_i$. Our goal is to select a subset of $\{1,\cdots,n\}$ to maximize the sum of their weights while they do not overlap.

Suppose the interval sequence $\{1,2,\cdots,n\}$ has been sorted by their finish time in an increasing order. Define \[p(i):=\max \{j\in \{1,2,\cdots i-1\}: f_j<s_i\},\] i.e. the largest index $j$ that is on the left of $i$ and does not overlap $i$. Let $\mathcal{O}_i$ be the optimal subset we select from $\{1,2,\cdots,i\}$. $OPT(i)$ is the sum of weights of intervals in $\mathcal{O}_i$, i.e. \[OPT(i):=\sum_{k\in \mathcal{O}_i}w_k.\] Then we have a dichotomy: if $i\in \mathcal{O}_i$, then \[\mathcal{O}_i = \mathcal{O}_{p(i)}\cup \{i\};\] if $i\notin \mathcal{O}_i$, then \[\mathcal{O}_i = \mathcal{O}_{i-1}.\] Therefore we have the recursive equation for $OPT(i)$, \[OPT(i)=\max \{OPT(i-1), w_i+OPT(p(i))\}\tag{1}\]

Then we can follow eq(1) to select interval $i$. Indeed, as long as we calculate $OPT(i), \,i=1,2,\cdots n$, we can find the optimal solution directly. Here is the implementation of this idea in C++ code.

#include<iostream>
#include<algorithm>

using namespace std;
//The goal of this code is to solve interval scheduling problem. There is n intervals with different value, we want to select several intervals with largest value sum.

int n;//total number of intervals

struct Intervals
{
    double s;//start time
    double f;//finish time
    double val;//value
    bool if_selected = false;

    bool operator < (const Intervals& A) const {
        return f < A.f;//sort intervals by finish time
    }
}Itv[10000];

double OptVal[10000];// OptVal[i] means the best value sum among 1~i

void GetIntervals()
{
    Itv[0].s = 0;
    Itv[0].f = 0;
    Itv[0].val = 0;
    cout << "Please input the total number of intervals: ";
    cin >> n;
    //input intervals
    cout << "Please input these "<< n <<" intervals:" <> Itv[i].s;
       cin >> Itv[i].f;
       cin >> Itv[i].val;
    }
    return;
}

int p(int j)//calculate the largest interval on the left of i that does not overlap j
{
    if (j == 0)
        return 0;
    for (int i = j - 1; i >= 0; i --)
        if (Itv[i].f <= Itv[j].s)
        {
            return i;
        }
}


void SelectItv(int j)
{
    for (int i = 1; i <= j; i ++)
    {
        double left,right;
        left = Itv[i].val + OptVal[p(i)];
        right = OptVal[i - 1];
        if (left > right)
        {
            OptVal[i] = left;
            Itv[i].if_selected = true;//choose interval i
            for (int k = i - 1; k > p(i); k --)
                Itv[k].if_selected = false;
            SelectItv(p(i));//Shrink the size of problem to 1~p(i)
        }
        else
        {
            OptVal[i] = right;// do not choose interval i
        }
    }
    return;
}

int main()
{
    OptVal[0] = 0;
    GetIntervals();
    sort(Itv+1, Itv+1+n); //sort in the structure [1,n)
    SelectItv(n);
    cout << "The selected intervals are: " << endl;
    for (int i = 1; i <= n; i ++ )
    {
        if (Itv[i].if_selected)
            cout << "[" << Itv[i].s << "," << Itv[i].f << "]" << "value " << Itv[i].val<< endl;
    }
    cout <<"The optimal value sum is " << OptVal[n] << " ." << endl;
    return 0;
}

2. Find the longest common subsequence of two sequences

Another famous example of dynamic programming is longest common subsequence (LCS) problem and sequence alignment problem. Let's articulate this problem first. Suppose we have two strings $A=a_1a_2\cdots a_m$ and $B=b_1b_2\cdots b_n$, where $a_i,b_i$ are characters from a fixed list, such as 26 letters in alphabet list. Sequence $S=s_1s_2\cdots s_l$ is called the subsequence of $A$ and $B$, if

Our goal is to find a subsequence $S$ of $A$ and $B$ with the longest length. In the optimal solution $S$, either $s_l=a_m=b_n$, or $a_m\neq b_n$. And if $a_m\neq b_n$, $a_m$ and $b_n$ cannot both appear in $S$, otherwise if there is $a_i(i< m)$ and $b_j(j< n)$ such that $a_m = b_j$ and $a_i = b_n$ are both in $S$, then it will contradict the second requirement in the definition of subsequence. In other word, if $a_m \neq b_n$, either $a_m$ or $b_n$ is not in $S$.

Define $OPT(i,j)$ be the length of $LCS$ of substring $A_i=a_1a_2\cdots a_i$ and $B_j=b_1b_2\cdots b_j$. Based on above analysis, we have the following recursive equation, \[OPT(i,j)=\begin{cases}OPT(i-1,j-1)+1&a_i=b_j\\ \max\{OPT(i-1,j),OPT(i,j-1)\}&a_i\neq b_j\end{cases}\tag{2}\] And define $OPT(i,0)=OPT(0,j)=0$.

Thus we can first use eq(2) to calculate all $OPT(i,j)$ from $OPT(1,1)$ to $OPT(m,n)$ recursively, and trace back the $OPT$ matrix to find the optimal solution. Below is the realization of the algorithm by Python.

def FindLCS(str1, str2):
    m = len(str1)
    n = len(str2)
    M = [[0 for x in range(n+1)] for x in range(m+1)]
    L = [[0 for x in range(n)] for x in range(m)]
    
    for i in range(m+1):
        M[i][0] = 0
    for j in range(n+1):
        M[0][j] = 0
        
    for i in range(m): #0,1,...,m-1
        for j in range(n): #0,1,...,n-1
            if str1[i] == str2[j]: # str1[0], str2[0] are corresponding to M[1][1]
                M[i+1][j+1]= M[i][j] + 1
                L[i][j] = 'Hit'            
            elif M[i+1][j] > M[i][j+1]:
                M[i+1][j+1] = M[i+1][j]
                L[i][j] = 'J-1'
            else:
                M[i+1][j+1] = M[i][j+1]
                L[i][j] = 'I-1'
    
    #output 
    LCS = ''
    i = m -1
    j = n -1
    while i > -1 and j > -1 :
        if L[i][j] == 'Hit':
            LCS = str1[i] + LCS
            i = i-1
            j = j-1
        elif L[i][j] == 'I-1' :
            i = i - 1
        else:
            j = j - 1
    print('String 1 is:', str1)
    print('String 2 is:', str2)
    print('Their longest common subsequence is:', LCS)
    return

#example
A = 'GHJKINKK'
B = 'GFHJIK'
FindLCS(A,B)
[Output]
String 1 is: GHJKINKK
String 2 is: GFHJIK
Their longest common subsequence is: GHJIK

Alignment problem is similar to LCS problem, but needs more steps to calculate mismatch and gap penalty. We can also define the score OPT(i,j) \[OPT(i,j) = \max \{\alpha_{ij} + OPT(i-1,j-1), OPT(i-1,j) +\delta, OPT(i,j-1)+\delta\}\] where $\alpha_{ij}$ is the similarity score between character $a_i$ and $b_j$, $\delta$ is the gap penalty. For the boundary, we can define $OPT(i,0)=i\delta$ and $OPT(0,j)=j\delta$. For example, we set $\alpha_{ij}=5$ if they are the same, while $\alpha_{ij}=-4$ if they are different, and set $\delta = -2$. Below is a realization of DNA alignment algorithm.

def DNA_align(str1, str2):
    m = len(str1)
    n = len(str2)
    GP = 2 #GAP PENALTY
    M = [[0 for x in range(n+1)] for x in range(m+1)]
    L = [[0 for x in range(n+1)] for x in range(m+1)]
    
    for i in range(m+1):
        M[i][0] = -i*GP
        L[i][0] = 'I-1'
    for j in range(n+1):
        M[0][j] = -j*GP
        L[0][j] = 'J-1'
        
    for i in range(m): #0,1,...,m-1
        for j in range(n): #0,1,...,n-1
            if str1[i] == str2[j]:
                val1 = M[i][j] + 5
            elif str1[i] != str2[j]:
                val1 = M[i][j] - 4
            val2 = M[i+1][j] - GP
            val3 = M[i][j+1] - GP
            val_max =  max(val1,val2,val3)
            M[i+1][j+1] = val_max
            if val1 == val_max:
                if str1[i] == str2[j]:
                    L[i+1][j+1] = 'Hit'
                elif str1[i] != str2[j]:
                    L[i+1][j+1] = 'Unmatched'
            elif val2 == val_max:
                L[i+1][j+1] = 'J-1'
            else:
                L[i+1][j+1] = 'I-1'
            

    LCS = ''
    S1 =''
    S2 =''
    i = m 
    j = n 
    while i > -1 and j > -1:
        if i == 0 and j == 0:
            break
        if L[i][j] == 'Hit':
            LCS = '|' + LCS
            S1 = str1[i-1] + S1
            S2 = str2[j-1] + S2
            i = i-1
            j = j-1

        elif L[i][j] == 'Unmatched':
            LCS = ' ' + LCS
            S1 = str1[i-1] + S1
            S2 = str2[j-1] + S2
            i = i-1
            j = j-1
            
        elif L[i][j] == 'I-1' :
            LCS = ' ' + LCS
            S1 = str1[i-1] + S1
            S2 = '-' + S2
            i = i - 1
            
        else:
            LCS = ' ' + LCS
            S1 = '-' + S1
            S2 = str2[j-1] + S2
            j = j - 1
            
    print('string 1:',S1)
    print('         ',LCS)
    print('string 2:',S2)
    return

#run an example
A = 'CGCGTTATTTCGAAACGCGCGCGCGC'
B = 'CGTAGGCTCGCAAAAAGCGCGCGCGCG'
DNA_align(A,B)
[Output]
string 1: CGCGTTA-TTTCG--AAACGCGCGCGCGC-
            || ||   |||  ||| |||||||||| 
string 2: --CG-TAGGCTCGCAAAAAGCGCGCGCGCG