CS 3100

Data Structures and Algorithms 2

Reductions

Aaron Bloomfield (aaron@virginia.edu)
Raymond Pettit (raymond.pettit@virginia.edu)
@github | ↑ |

Network Flow

Readings in CLRS 4th edition: chapter 24

Network Flow

Question: What is the maximum throughput of the railroad network from Omaha (OMA, far left) to Boston (BBY, far right)?

Flow Networks

Graph \(G=(V,E)\)
Start node \({\color{purple}s}\in V\)
Sink node \({\color{Blue}t}\in V\)
Edge capacities \({\color{green}c(e)} \in \mathbb{R}^+\)

Max flow intuition: If \(\color{purple}s\) is a faucet, \(\color{Blue}t\) is a drain, and \(\color{purple}s\) connects to \(\color{Blue}t\) through a network of pipes \(E\) with capacities \(\color{green}c(e)\), what is the maximum amount of water which can flow from the faucet to the drain?

Network Flow

flow / capacity

Assignment of values \(\color{red}f(e)\) to edges
- “Amount of water going through that pipe”
Capacity constraint
- \({\color{red}f(e)} \le {\color{green}c(e)}\)
- “Flow cannot exceed capacity”
Flow constraint
- \(\forall v \in V - \{ {\color{purple}s},{\color{Blue}t}\}\), \(\text{inflow}(v)=\text{outflow}(v)\)
- \(\text{inflow}(v)=\sum_{v \in V}f(x,v)\)
- \(\text{outflow}(v)=\sum_{v \in V}f(v,x)\)
- Water going in must match water coming out
Flow of \(G\): \(|f|=\text{outflow}({\color{purple}s})-\text{inflow}({\color{purple}s})\)
Net outflow of \(\color{purple}s\)
- 3 in this example

Maximum Flow Problem

Of all valid flows through the graph, find one that maximizes:

\[|f|=\text{outflow}({\color{purple}s})-\text{inflow}({\color{purple}s})\]

Greedy Approach

Greedy choice: saturate highest capacity path first

Greedy Approach

Greedy choice: saturate highest capacity path first

Greedy Approach

Greedy choice: saturate highest capacity path first

Flow: 20

Greedy Approach

Greedy choice: saturate highest capacity path first

Maximum flow: 30

Observe: highest capacity path is not saturated in optimal solution

Residual Graphs

Given a flow \(f\) in graph \(G\), the residual graph \(G_{\color{red}f}\) models additional flow that is possible

Forward edge for each edge in \(G\) with weight set to remaining capacity \({\color{green}c(e)}-{\color{red}f(e)}\)
- Models additional flow that can be sent along the edge: flow to add

Backward edge by flipping each edge \(e\) in \(G\) with weight set to flow \({\color{red}f(e)}\)
- Models amount of flow that can be removed from the edge: flow to remove

Flow \(\color{red}f\) in \(G\)

Residual graph \(G_{\color{red}f}\)

Residual Graphs

Given a flow \(f\) in graph \(G\), the residual graph \(G_{\color{red}f}\) models additional flow that is possible

Forward edge for each edge in \(G\) with weight set to remaining capacity \({\color{green}c(e)}-{\color{red}f(e)}\)
- Models additional flow that can be sent along the edge: flow to add

Backward edge by flipping each edge \(e\) in \(G\) with weight set to flow \({\color{red}f(e)}\)
- Models amount of flow that can be removed from the edge: flow to remove

Flow \(\color{red}f\) in \(G\)

Residual graph \(G_{\color{red}f}\)

Residual Graphs Example

Flow Graph	Residual Graph

Residual Graphs

Consider a path from \({\color{purple}s} \rightarrow {\color{Blue}t}\) in \(G_{\color{red}f}\) using only edges with positive (non-zero) weight Consider the minimum-weight edge along the path: we can increase the flow by \(w(e)\)
Send \(w(e)\) flow along all forward edges (these have at least \(w(e)\) capacity) Remove \(w(e)\) flow along all backward edges (these contain at least \(w(e)\) units of flow)
Observe: Flow has increased by \(w(e)\)

Residual Graphs

Consider a path from \({\color{purple}s} \rightarrow {\color{Blue}t}\) in \(G_{\color{red}f}\) using only edges with positive (non-zero) weight Consider the minimum-weight edge along the path: we can increase the flow by \(w(e)\)
Send \(w(e)\) flow along all forward edges (these have at least \(w(e)\) capacity) Remove \(w(e)\) flow along all backward edges (these contain at least \(w(e)\) units of flow)
Observe: Flow has increased by \(w(e)\)

Residual Graphs

Consider a path from \({\color{purple}s} \rightarrow {\color{Blue}t}\) in \(G_{\color{red}f}\) using only edges with positive (non-zero) weight Consider the minimum-weight edge along the path: we can increase the flow by \(w(e)\)
Send \(w(e)\) flow along all forward edges (these have at least \(w(e)\) capacity) Remove \(w(e)\) flow along all backward edges (these contain at least \(w(e)\) units of flow)
Observe: Flow has increased by \(w(e)\)

Ford-Fulkerson Algorithm

Define an augmenting path to be an \({\color{purple}s} \rightarrow {\color{Blue}t}\) path in the residual graph \(G_{\color{red}f}\) (using edges of non-zero weight)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)
While there is an augmenting path \(p\) in \(G_{\color{red}f}\):
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

Ford-Fulkerson approach: take any augmenting path (will revisit this later)

Ford-Fulkerson Example

	Increase flow by 1 unit

Initially: \({\color{red}f(e)}=0\) for all \(e \in E\)	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	Increase flow by 1 unit

	Residual graph \(G_{\color{red}f}\)

Ford-Fulkerson Example

	No more augmenting paths

	Residual graph \(G_{\color{red}f}\)

Maximum flow: 4

Ford-Fulkerson Running Time

Define an augmenting path to be an \({\color{purple}s} \rightarrow {\color{Blue}t}\) path in the residual graph \(G_{\color{red}f}\) (using edges of non-zero weight)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)
While there is an augmenting path \(p\) in \(G_{\color{red}f}\):
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

How many iterations are needed??

For integer-valued capacities, min-weight of each augmenting path is 1, so number of iterations is bounded by \(|f^*|\), where \(|f^*|\) is max-flow in \(G\)
For rational-valued capacities, can scale to make capacities integer
For irrational-valued capacities, algorithm may never terminate!

For graphs with integer capacities, running time of Ford-Fulkerson is

\[O(|f^*| \cdot |E|)\]

Highly undesirable if \(|f^*| >> |E|\) (e.g., graph is small, but capacities are \(\approx 2^{32}\))

As described, algorithm is not polynomial-time!

Initialization: \(O(|E|)\)

Construct residual network: \(O(|E|)\)

We only care about nodes reachable from the source \(s\), so the number of nodes that are “relevant” is at most \(|E|\)

Finding one augmenting path in residual network: \(O(|E|)\) using BFS/DFS

Ford-Fulkerson Running Time

Define an augmenting path to be an \({\color{purple}s} \rightarrow {\color{Blue}t}\) path in the residual graph \(G_{\color{red}f}\) (using edges of non-zero weight)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)
While there is an augmenting path \(p\) in \(G_{\color{red}f}\):
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

How many iterations are needed?

For integer-valued capacities, min-weight of each augmenting path is 1, so number of iterations is bounded by \(|f^*|\), where \(|f^*|\) is max-flow in \(G\)
For rational-valued capacities, can scale to make capacities integer
For irrational-valued capacities, algorithm may never terminate!

For graphs with integer capacities, running time of Ford-Fulkerson is

\[O(|f^*| \cdot |E|)\]

Highly undesirable if \(|f^*| >> |E|\) (e.g., graph is small, but capacities are \(\approx 2^{32}\))

As described, algorithm is not polynomial-time!

Initialization: \(O(|E|)\)

Construct residual network: \(O(|E|)\)

We only care about nodes reachable from the source \(s\), so the number of nodes that are “relevant” is at most \(|E|\)

Finding one augmenting path in residual network: \(O(|E|)\) using BFS/DFS

Ford-Fulkerson Running Time

Define an augmenting path to be an \({\color{purple}s} \rightarrow {\color{Blue}t}\) path in the residual graph \(G_{\color{red}f}\) (using edges of non-zero weight)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)
While there is an augmenting path \(p\) in \(G_{\color{red}f}\):
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

How many iterations are needed?

For integer-valued capacities, min-weight of each augmenting path is 1, so number of iterations is bounded by \(|f^*|\), where \(|f^*|\) is max-flow in \(G\)
For rational-valued capacities, can scale to make capacities integer
For irrational-valued capacities, algorithm may never terminate!

For graphs with integer capacities, running time of Ford-Fulkerson is

\[O(|f^*| \cdot |E|)\]

Highly undesirable if \(|f^*| >> |E|\) (e.g., graph is small, but capacities are \(\approx 2^{32}\))

As described, algorithm is not polynomial-time!

Initialization: \(O(|E|)\)

Construct residual network: \(O(|E|)\)

We only care about nodes reachable from the source \(s\), so the number of nodes that are “relevant” is at most \(|E|\)

Finding one augmenting path in residual network: \(O(|E|)\) using BFS/DFS

Ford-Fulkerson Running Time

Define an augmenting path to be an \({\color{purple}s} \rightarrow {\color{Blue}t}\) path in the residual graph \(G_{\color{red}f}\) (using edges of non-zero weight)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)
While there is an augmenting path \(p\) in \(G_{\color{red}f}\):
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

How many iterations are needed?

For integer-valued capacities, min-weight of each augmenting path is 1, so number of iterations is bounded by \(|f^*|\), where \(|f^*|\) is max-flow in \(G\)
For rational-valued capacities, can scale to make capacities integer
For irrational-valued capacities, algorithm may never terminate!

For graphs with integer capacities, running time of Ford-Fulkerson is

\[O(|f^*| \cdot |E|)\]

Highly undesirable if \(|f^*| >> |E|\) (e.g., graph is small, but capacities are \(\approx 2^{32}\))

As described, algorithm is not polynomial-time!

Initialization: \(O(|E|)\)

Construct residual network: \(O(|E|)\)

We only care about nodes reachable from the source \(s\), so the number of nodes that are “relevant” is at most \(|E|\)

Finding one augmenting path in residual network: \(O(|E|)\) using BFS/DFS

Ford-Fulkerson Running Time

Define an augmenting path to be an \({\color{purple}s} \rightarrow {\color{Blue}t}\) path in the residual graph \(G_{\color{red}f}\) (using edges of non-zero weight)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)
While there is an augmenting path \(p\) in \(G_{\color{red}f}\):
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

How many iterations are needed?

For integer-valued capacities, min-weight of each augmenting path is 1, so number of iterations is bounded by \(|f^*|\), where \(|f^*|\) is max-flow in \(G\)

For graphs with integer capacities, running time of Ford-Fulkerson is

\[O(|f^*| \cdot |E|)\]

Highly undesirable if \(|f^*| >> |E|\) (e.g., graph is small, but capacities are \(\approx 2^{32}\))

As described, algorithm is not polynomial-time!

Initialization: \(O(|E|)\)

Construct residual network: \(O(|E|)\)

We only care about nodes reachable from the source \(s\), so the number of nodes that are “relevant” is at most \(|E|\)

Finding one augmenting path in residual network: \(O(|E|)\) using BFS/DFS

Worst Case Ford-Fulkerson

	Increase flow by 1 unit

Worst Case Ford-Fulkerson

	Increase flow by 1 unit

Worst Case Ford-Fulkerson

	Increase flow by 1 unit

Worst Case Ford-Fulkerson

	Increase flow by 1 unit

Worst Case Ford-Fulkerson

	Increase flow by 1 unit

Worst Case Ford-Fulkerson

	Increase flow by 1 unit

Worst Case Ford-Fulkerson

	Increase flow by 1 unit

Worst Case Ford-Fulkerson

	Increase flow by 1 unit

Observation: each iteration increases flow by 1 unit

Total number of iterations: \(|f^*|=200\)

Ford-Fulkerson Running Time

Define an augmenting path to be an \({\color{purple}s} \rightarrow {\color{Blue}t}\) path in the residual graph \(G_{\color{red}f}\) (using edges of non-zero weight)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)
While there is an augmenting path \(p\) in \(G_{\color{red}f}\):
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

How many iterations are needed?

For integer-valued capacities, min-weight of each augmenting path is 1, so number of iterations is bounded by \(|f^*|\), where \(|f^*|\) is max-flow in \(G\)
For rational-valued capacities, can scale to make capacities integer
For irrational-valued capacities, algorithm may never terminate!

For graphs with integer capacities, running time of Ford-Fulkerson is

\[O(|f^*| \cdot |E|)\]

Highly undesirable if \(|f^*| >> |E|\) (e.g., graph is small, but capacities are \(\approx 2^{32}\))

As described, algorithm is not polynomial-time!

Initialization: \(O(|E|)\)

Construct residual network: \(O(|E|)\)

We only care about nodes reachable from the source \(s\), so the number of nodes that are “relevant” is at most \(|E|\)

Finding one augmenting path in residual network: \(O(|E|)\) using BFS/DFS

Ford-Fulkerson Running Time

Define an augmenting path to be an \({\color{purple}s} \rightarrow {\color{Blue}t}\) path in the residual graph \(G_{\color{red}f}\) (using edges of non-zero weight)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)
While there is an augmenting path \(p\) in \(G_{\color{red}f}\):
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

How many iterations are needed?

For integer-valued capacities, min-weight of each augmenting path is 1, so number of iterations is bounded by \(|f^*|\), where \(|f^*|\) is max-flow in \(G\)
For rational-valued capacities, can scale to make capacities integer
For irrational-valued capacities, algorithm may never terminate!

For graphs with integer capacities, running time of Ford-Fulkerson is

\[O(|f^*| \cdot |E|)\]

Highly undesirable if \(|f^*| >> |E|\) (e.g., graph is small, but capacities are \(\approx 2^{32}\))

As described, algorithm is not polynomial-time!

Initialization: \(O(|E|)\)

Construct residual network: \(O(|E|)\)

We only care about nodes reachable from the source \(s\), so the number of nodes that are “relevant” is at most \(|E|\)

Finding one augmenting path in residual network: \(O(|E|)\) using BFS/DFS

Can We Avoid This?

Edmonds-Karp Algorithm: choose augmenting path with fewest hops

Running time: \(\Theta\left(|E| \cdot |f^*|,|V| \cdot |E|^2 \right) = O\left(|V| \cdot |E|^2 \right)\)

Ford-Fulkerson max-flow algorithm:

Initialize \({\color{red}f(e)}=0\) for all \(e \in E\)
Construct the residual network \(G_{\color{red}f}\)

How to find this? With BFS!

Edmonds-Karp = Ford-Fulkerson
using BFS to find augmenting path

While there is an augmenting path in \(G_{\color{red}f}\), let \(p\) be the path with fewest hops:
- Let \(c=\min_{e \in E}c_f(e)\) where \(c_f(e)\) is the weight of edge \(e\) in the residual network \(G_{\color{red}f}\)
- Add \(c\) units of flow to \(G\) based on the augmenting path \(p\)
- Update the residual network \(G_{\color{red}f}\) for the updated flow

See CLRS, chapter 24

Max flow / Min cut

Readings in CLRS 4th edition: chapter 24

Reminder: Graph Cuts

A cut of a graph \(G=(V,E)\) is a partition
of the nodes into two sets, \(\color{brown}S\) and \(\color{cornflowerblue}V-S\)

Notion extends naturally
to a set of edges

An edge \(\color{purple}(v_1,v_2) \in E\) crosses a
cut if \(v_1 \in S\) and \(v_2 \in V-S\)

An edge \(\color{green}(v_1,v_2) \in E\) respects a
cut if \(v_1,v_2 \in S\) or \(_1,v_2 \in V-S\)

Showing Correctness of Ford-Fulkerson

Consider cuts which separate nodes \(\color{purple}s\) and \(\color{Blue}t\)
- Let \({\color{purple}s} \in {\color{magenta}S}, {\color{Blue}t} \in {\color{skyblue}T}\) such that \(V={\color{magenta}S} \bigcup {\color{skyblue}T}\)
Cost of cut \(({\color{magenta}S}, {\color{skyblue}T}) = ||{\color{magenta}S}, {\color{skyblue}T} ||\)
- Sum capacities of edges which go from \({\color{magenta}S}\) to \({\color{skyblue}T}\)
- This example: 5

Maxflow \(\le\) MinCut

Max flow upper bounded by any cut separating \(\color{purple}s\) and \(\color{Blue}t\)
Why? “Conservation of flow”
- All flow exiting \(\color{purple}s\) must eventually get to \(\color{Blue}t\)
- To get from \(\color{purple}s\) to \(\color{Blue}t\), all “pipes” must cross the cut
Conclusion: if we find the minimum-cost cut, we’ve found the max flow
- \(\max_f|f| \le \min_{ {\color{magenta}S}, {\color{skyblue}T}}||{\color{magenta}S}, {\color{skyblue}T}||\)

Maxflow/Mincut Theorem

To show Ford-Fulkerson is correct:
- Show that when there are no more augmenting paths, there is a cut with cost equal to the flow
Conclusion: the maximum flow through a network matches the minimum-cost cut
- \(\max_f|f| \le \min_{ {\color{magenta}S}, {\color{skyblue}T}}||{\color{magenta}S}, {\color{skyblue}T}||\)
Duality
- When we’ve maximized max flow, we’ve minimized min cut (and vice-versa), so we can check when we’ve found one by finding the other

Example: Maxflow/Mincut

Flow Graph \(G\)	Residual graph \(G_{\color{red}f}\)

\(\|f\|=4\)	No more augmenting paths
\(\|\|{\color{magenta}S}, {\color{skyblue}T} \|\|=4\)

Idea: When there are no more augmenting paths, there
exists a cut in the graph with cost matching the flow

Proof: Maxflow/Mincut Theorem

If \(|f|\) is a max flow, then \(G_{\color{red}f}\) has no augmenting path
- Otherwise, use that augmenting path to “push” more flow
Define \({\color{magenta}S}=\) nodes reachable from source node \(\color{purple}s\) by positive-weight edges in the residual graph
- \({\color{skyblue}T}=V-{\color{magenta}S}\)
- \({\color{magenta}S}\) separates \({\color{purple}s},{\color{Blue}t}\) (otherwise there’s an augmenting path)

Flow Graph \(G\)	Residual graph \(G_{\color{red}f}\)

Proof: Maxflow/Mincut Theorem

To show: \(||{\color{magenta}S},{\color{skyblue}T}||=|f|\)
- Weight of the cut matches the flow across the cut
Consider “forward” edge \(\color{forestgreen}(u,v)\) with \(u \in {\color{magenta}S},v \in {\color{skyblue}T}\)
- \({\color{red}f(u,v)}={\color{forestgreen}c(u,v)}\) because otherwise \({\color{forestgreen}w(u,v)}>0\) in \(G_{\color{red}f}\), which would mean \(v \in {\color{magenta}S}\)
Consider “reverse” edge \(\color{brown}(y,x)\) with \(y \in {\color{skyblue}T}, x \in {\color{magenta}S}\)
- \({\color{red}f(y,x)}=0\) because otherwise the back edge \({\color{brown}w(y,x)}>0\) in \(G_{\color{red}f}\), which would mean \(y \in {\color{magenta}S}\)

Flow Graph \(G\)	Residual graph \(G_{\color{red}f}\)

Proof Summary

The flow \(|f|\) of \(G\) is upper-bounded by the sum of capacities of edges crossing any cut separating source \({\color{purple}s}\) and sink \({\color{Blue}t}\)
When Ford-Fulkerson terminates, there are no more augmenting paths in \(G_{\color{red}f}\)
When there are no more augmenting paths in \(G_{\color{red}f}\) then we can define a cut \({\color{magenta}S}=\{\) nodes reachable from source node \({\color{purple}s}\}\) by positive-weight edges in the residual graph
The sum of edge capacities crossing this cut must match the flow of the graph
Therefore this flow is maximal

Bipartite Matching

Readings in CLRS 4th edition: chapter 25

Edge-Disjoint Paths

Given a graph \(G=(V,E)\), a start node \({\color{purple}s}\) and a destination node \({\color{Blue}t}\), give the maximum number of paths from \({\color{purple}s}\) to \({\color{Blue}t}\) which share no edges

Edge-Disjoint Paths

Set of edge-disjoint paths of size 3:

Edge-Disjoint Paths

Set of edge-disjoint paths of size 4:

How could we solve this?

Edge-Disjoint Paths

Make \({\color{purple}s}\) and \({\color{Blue}t}\) the source and sink, give each edge capacity 1, find the max flow.

Set of edge-disjoint paths of size 4
Max flow = 4

Edge-Disjoint Paths

Make \({\color{purple}s}\) and \({\color{Blue}t}\) the source and sink, give each edge capacity 1, find the max flow.

Set of edge-disjoint paths of size 4
Max flow = 4

Vertex-Disjoint Paths

Not a vertex-disjoint path!

Vertex-Disjoint Paths

Not a vertex-disjoint path!

How could we solve this?

Vertex-Disjoint Paths

Idea: Convert an instance of the vertex-disjoint paths problem into an instance of edge-disjoint paths

Make two copies of each node, one connected to incoming edges, the other to outgoing edges

Compute Edge-Disjoint Paths on new graph

Maximum Bipartite Matching

Given a graph \(G=(L,R,E)\)
- A set of left nodes, right nodes, and edges between left and right
Find the largest set of edges \(M \subseteq E\) such that each node \(u \in L\) or \(v \in R\) is incident to at most one edge

Maximum Bipartite Matching

How could
we solve
this?

Maximum Bipartite Matching Using Max Flow

Make \(G=(L,R,E)\) a flow network \(G=(V,E)\) by:

Adding in a source and sink to the set of nodes:
- \(V'=L \bigcup R \bigcup \{ {\color{purple}s},{\color{Blue}t}\}\)
Adding an edge from source to \(L\) and from \(R\) to sink:
- \(E'=E \bigcup \{u \in L | ({\color{purple}s},u)\}\) \(\bigcup \{v \in R | (v,{\color{Blue}t})\}\)
Make each edge cap 1:
- \(\forall e \in E', c(e)=1\)

Maximum Bipartite Matching Using Max Flow

Make \(G=(L,R,E)\) a flow network \(G=(V,E)\) by:

Make \(G\) into \(G'\)
- \(\color{red}\Theta(L+R)\)
Compute Max Flow on \(G'\)
- \(\color{red}\Theta(E \cdot V)\) since \(|f| \le L\)
Return \(M\) as “middle” edges with flow 1
- \(\color{red}\Theta(L+R)\)

Total: \(\Theta(E \cdot V)\)

Reductions

Readings in CLRS 4th edition: N/A (reductions are covered
in CLRS, but not in a context we are studying in CS 3100)

Reductions

Algorithm technique of supreme ultimate power
Convert instance of problem A to an instance of Problem B
Convert solution of problem B back to a solution of problem A

Reductions

Shows how two different problems relate to each other

MOVIE TIME!

MacGyver’s Reduction

Problem we don’t know how to solve

Problem we do know how to solve

Opening a door

Solution for \(A\):

Keg cannon
battering ram

Reduction

Lighting a fire

How?

Solution for \(B\):

Alcohol, wood, matches

Bipartite Matching Reduction

Problem we don’t know how to solve

Problem we do know how to solve

Bipartite Matching

Solution for \(A\):

Reduction

Max Flow

Ford-Fulkerson

Solution for \(B\):

Edge Disjoint Paths Reduction

Problem we don’t know how to solve

Problem we do know how to solve

Edge Disjoint Paths

Solution for \(A\):

Max Flow

Ford-Fulkerson

Solution for \(B\):

Vertex Disjoint Paths Reduction

Problem we don’t know how to solve

Problem we do know how to solve

Vertex Disjoint Paths

Solution for \(A\):

Merge these back

Edge Disjoint Paths

(another reduction)

Solution for \(B\):

Vertex Disjoint Paths Big Picture

Problem we don’t know how to solve

Problem we still don’t know how to solve

Problem we do know how to solve

Vertex Disjoint Paths

Solution for \(A\):

Merge these back

Edge Disjoint Paths

Solution for \(B\):

Use edges with flow

Max Flow

Ford-Fulkerson

Solution for \(C\):

Reductions for Algorithms

Create an algorithm for a new problem by using one you already know!
- More algorithms = More opportunities!
The problem you reduced to could itself be solved using a reduction!

In General: Reduction

Problem we don’t know how to solve

Solution for \(A\):

Map instances of problem \(A\) to instances of problem \(B\)

Any instance of \(A\) can be mapped to some instance of \(B\)

Map solutions of problem \(B\) to solutions of problem \(A\)

Reduction

Problem we do know how to solve

Using any algorithm
for \(B\)

Solution for \(B\):

Another use of Reductions

Problem \(A\)

Suppose I know a worst-case lower-bound of \(\Omega(f(n))\) for \(A\)

\(\Omega(f(n))\)

Solution for \(A\):

Map instances of problem \(A\) to instances of problem \(B\)

Then this entire path must be \(\Omega(f(n))\)

Map solutions of problem \(B\) to solutions of problem \(A\)

Reduction

Some algorithm for \(B\)

Solution for \(B\)

Worst Case Lower Bound

Definition:
- A worst case lower bound on a problem is an asymptotic lower bound on the worst case running time of any algorithm which solves it
- If \(f(n)\) is a worst case lower bound for problem \(A\), then the worst-case running time of any algorithm which solves \(A\) must be \(\Omega(f(n))\)
  - i.e. for sufficiently large values of \(n\), for every algorithm which solves \(A\), there is at least one input of size \(n\) which causes the algorithm to do \(\Omega(f(n))\) steps
Examples:
- \(n\) is a worst-case lower bound on finding the minimum in a list
- \(n^2\) is a worst-case lower bound on matrix multiplication

Worst case lower bound Proofs

Opening a door

Problem \(A\)

Problem \(B\)

Lighting a fire

Alcohol, wood,
matches

Algorithm for \(B\)

Algorithm for \(A\)

Keg cannon
battering ram

\(A\) is not harder than problem \(B\): \(A \le B\)

The name “reduces” is confusing: it is in the opposite direction of the making

Proof of Lower Bound by Reduction

Proof by contradiction: assume \(Y\) is quick
		We know \(X\) is slow (by a proof) (e.g., \(X =\) some way to open the door)
		Assume \(Y\) is quick [toward contradiction] (\(Y =\) some way to light a fire)
		Show how to use \(Y\) to perform \(X\) quickly
		\(X\) is slow, but \(Y\) could be used to perform \(X\) quickly conclusion: \(Y\) must not actually be quick

Reduction Proof Notation

Problem \(A\)

Problem \(B\)

Algorithm for \(B\)

With \(O(f(n))\) overhead

Algorithm for \(A\)

\(A\) is not harder than problem \(B\)

\(A \le B\)

If \(A\) requires time \(\Omega(f(n))\) time then \(B\) also requires time \(\Omega(f(n))\)

\(A \le_{f(n)}B\)

Two Ways to Use Reductions

Suppose we have a “fast” reduction from \(A\) to \(B\)

A “fast” algorithm for \(B\) gives a fast algorithm for \(A\)

Then \(\color{red}A\) is fast

If \(\color{orange}B\) is fast

If we have a worst-case lower bound for \(A\), we also have one for \(B\)

If \(\color{red}A\) is slow

Then \(\color{orange}B\) is slow

Bipartite Matching Reduction

Problem we don’t know how to solve

Problem we do know how to solve

Bipartite Matching

Solution for \(A\):

Reduction

Max Flow

Ford-Fulkerson

Solution for \(B\):

If this is fast

Then this is fast

If this is slow

Then this is slow

Worst-case Lower Bound Using Reductions

Closest pair of points
- D&C algorithm: \(\Theta(n \log n)\)
- Can we do better?

Idea: Show that doing closest pair in \(o(n \log n)\) enables an impossibly fast algorithm for another problem

Reductions for Lower-Bound Proofs

Problem we know is “hard”

Problem \(A\)

and this must be slow,

“Hard” means this
must be slow

Solution for \(A\):

Quickly map instances of problem \(A\) to instances of problem \(B\)

If this is quick

Map solutions of problem \(B\) to solutions of problem \(A\)

Reduction

and this is quick

Problem we want to show is “hard”

then this can’t be fast!

Some algorithm for \(B\)

Solution for \(B\)

Reductions for Lower-Bound Proofs

Problem we know is \(\Omega(n \log n)\)

Problem \(A\)

Then this can be done in \(o(n\log n)\)

\(\Omega(n \log n)\)

Solution for \(A\):

Quickly map instances of problem \(A\) to instances of problem \(B\)

Map solutions of problem \(B\) to solutions of problem \(A\)

Reduction

Problem we want to show is \(\Omega(n \log n)\)

Closest
pair of
points

Some algorithm for \(CPP\)

If this could be done in
\(o(n \log n)\)

Solution for \(B\)

A “Hard” Problem: Element Uniqueness

113

901

555

512

245

800

018

121

True

103

801

401

323

255

323

999

101

False

Input:
- A list of integers
Output:
- True if all values
  are unique, False otherwise
Can this be solved in \(O(n \log n)\) time?
- Yes! Sort, then check if any adjacent elements match
Can this be solved in \(o(n \log n)\) time?
- No! (we’re going to skip this proof)
- But we can assume it’s proven that this is \(\Omega(n \log n)\)

Reductions for Lower Bound on CPP

Problem we know is \(\Omega(n \log n)\)

Problem \(A\)

Then this can be done in \(o(n\log n)\)

\(\Omega(n \log n)\)

Solution for \(A\):

Map instances of problem \(A\) to instances of \(CPP\) in \(o(n \log n)\)

Map solutions of \(CPP\) to solutions of problem \(A\) in \(o(n \log n)\)

Reduction

Problem we want to show is \(\Omega(n \log n)\)

Closest
pair of
points

Some algorithm for \(CPP\)

If this could be done in
\(o(n \log n)\)

Solution for \(B\)

Two Ways to Use Reductions

Suppose we have a “fast” reduction from \(A\) to \(B\)

A “fast” algorithm for \(B\) gives a fast algorithm for \(A\)

Then \(\color{red}A\) is fast

If \(\color{orange}B\) is fast

If we have a worst-case lower bound for \(A\), we also have one for \(B\)

If \(\color{red}A\) is slow

Then \(\color{orange}B\) is slow

Party Problem

Draw edges between people who don’t get along
Find the maximum number of people who get along

Maximum Independent Set

Independent set: \(S \subseteq V\) is an independent set if no two nodes in \(S\) share an edge
Maximum Independent Set problem: Given a graph \(G=(V,E)\), find the maximum independent set \(S\)

Example

Independent set of size 6

Generalized Baseball

Need to place defenders on bases such that every edge is defended

What’s the fewest number of defenders needed?

Vertex Cover

Vertex Cover: \(C \subseteq V\) is a vertex cover if every edge in \(E\) has one of its endpoints in \(C\)
Minimum Vertex Cover: Given a graph \(G=(V,E)\) find the minimum vertex cover \(C\)

Generalized Baseball

Vertex cover of size 5

MaxIndSet \(\le_V\) MinVertCover

Problem \(A\)

\(O(V)\) reduces to

Problem \(B\)

Algorithm for \(B\)

can be used to make

Algorithm for \(A\)

If \(A\) requires \(\Omega(f(n))\) time, then \(B\) also requires \(\Omega(f(n))\) time

\[A \le_V B\]

Reduction Idea

\(S\) is an independent set of \(G\) iff \(V-S\) is a vertex cover of \(G\)

Independent Set

Vertex Cover

Reduction Idea

\(S\) is an independent set of \(G\) iff \(V-S\) is a vertex cover of \(G\)

Independent Set

Vertex Cover

MinVertCov \(V\)-time reducible to MaxIndSet

Problem we don’t know how to solve

Problem we do know how to solve

MinVertCov

Solution for MinVertCov:

Do nothing

We must show (prove):
1) how to make the
construction
2) Why it works

Take the complement

MaxIndSet

(any MaxIndSet alg)

MaxIndSet solution:

Proof: ⇐

\(\color{red}S\) is an independent set of \(G\) iff \(V-S\) is a vertex cover of \(G\)

Let \(\color{red}S\) be an independent set

Consider any edge \({\color{magenta}(x,y)} \in E\)

If \(\color{red}x\in S\), then \(y \notin S\) because otherwise \(\color{red}S\) would not be an independent set

Therefore \(y \in V-S\), so edge \({\color{magenta}(x,y)}\) is covered by \(V-S\)

Proof: ⇒

\(S\) is an independent set of \(G\) iff \(V-S\) is a vertex cover of \(G\)

Let \(\color{red}V-S\) be a vertex cover

Consider any edge \({\color{magenta}(x,y)} \in E\)

At least one of \(\color{red}x\) and \(\color{cornflowerblue}y\) belong to \(\color{red}V-S\) because \(\color{red}V-S\) is a vertex cover of \(G\)

Therefore \(\color{red}x\) and \(\color{cornflowerblue}y\) are not both in \(\color{cornflowerblue}S\)

No edge has both end nodes in \(\color{cornflowerblue}S\), thus \(\color{cornflowerblue}S\) is an independent set

MaxIndSet \(V\)-time reducible to MinVertCov

Problem we don’t know how to solve

Problem we do know how to solve

MaxIndSet

Solution for MaxIndSet:

Do nothing

We must show (prove):
1) how to make the
construction
2) Why it works

Take the complement

MinVertCov

(any MinVertCov alg)

MinVertCov solution:

If solving \(A\) was
always slow

Then this shows
solving \(B\) is also slow

MinVertCov \(V\)-time reducible to MaxIndSet

Problem we don’t know how to solve

Problem we do know how to solve

MinVertCov

Solution for MinVertCov:

Do nothing

We must show (prove):
1) how to make the
construction
2) Why it works

Take the complement

MaxIndSet

(any MaxIndSet alg)

MaxIndSet solution:

If solving \(A\) was
always slow

Then this shows
solving \(B\) is also slow

Conclusion

MaxIndSet and MinVertCov are either both fast, or both slow
Spoiler alert: We don’t know which!
- (But we think they’re both slow)
Both problems are NP-Complete

NP-Completeness

Readings in CLRS 4th edition: chapter 34

Introduction

What we could prove that some problems are (most likely) hard?
- Here, “hard” means that it runs in exponential time
- “easy” means that it runs in polynomial time
We want there to be “hard” problems
- Cracking encryption should be “hard”
- But regular encryption and decryption should be “easy”
Given a new encryption algorithm, how do we show that it’s really “hard”?
- And not that we just haven’t found an “easy” solution?

DMT 1 review

An AND \((\wedge)\) operation is called a conjunction
An OR \((\vee)\) operation is called a disjunction
DeMorgan’s law:
- \(\neg(a \vee b) == \neg a \wedge \neg b\)
- \(\neg(a \wedge b) == \neg a \vee \neg b\)

Background: Satisfiability (SAT)

Given a boolean expression, can we find an assignment of true/false values to the variables such that the entire expression is true?
Examples:
- \(x_1 \wedge (x_2 \vee \neg x_3) \wedge (\neg x_1 \vee x_3) \wedge (x_2 \vee \neg x_3)\)
  - Yes: \(x_1\) is true, \(x_2\) is true, \(x_3\) is true
- \(x_1 \wedge (x_2 \vee \neg x_3) \wedge (\neg x_1 \vee x_3) \wedge ({\color{red}\neg} x_2 \vee \neg x_3)\)
  - No, there is no assignment of true/false values that will satisfy this equation

Background: Satisfiability (SAT)

The equations presented so far:
- \(x_1 \wedge (x_2 \vee \neg x_3) \wedge (\neg x_1 \vee x_3) \wedge (x_2 \vee \neg x_3)\)
- \(x_1 \wedge (x_2 \vee \neg x_3) \wedge (\neg x_1 \vee x_3) \wedge ({\color{red}\neg} x_2 \vee \neg x_3)\)
… are in conjunctive normal form
- Meaning it’s a conjunction (AND’ing) of disjunctions (OR’ing)
- We’ll keep this as a standard format
Nobody yet has found an efficient (polynomial-time) algorithm to solve it, only exponential-time algorithms
Note that this problem is hard to solve, but easy to verify when given an answer

How would we check for a valid UVA userid?

Format:
- Starts with 2 or 3 letters
- Optional digit
- 1-3 more letters if there was a digit
Valid forms: ll, lll, lldl, llldl, lldll, llldll, lldlll, llldlll

Programmatically

def check_uva_userid1(what):
  chars = list(what.lower())
  
  # check first character (must be a letter)
  if len(chars) == 0: return False
  if not chars[0].isalpha(): return False
  chars.pop(0)
  
  # check second character (must be a letter)
  if len(chars) == 0: return False
  if not chars[0].isalpha(): return False
  chars.pop(0)
  
  # return true if of the form ll
  if len(chars) == 0: return True
  
  # check optional 3rd letter
  if chars[0].isalpha():
    chars.pop(0)
    # return true if of the form lll
    if len(chars) == 0: return True
  
  # check digit
  if len(chars) == 0: return False
  if not chars[0].isdigit(): return False
  chars.pop(0)
  
  # check first letter after the digit
  if len(chars) == 0: return False
  if not chars[0].isalpha(): return False
  chars.pop(0)
  
  # return true if of the form lldl or llldl
  if len(chars) == 0: return True
  
  # check second letter after the digit
  if not chars[0].isalpha(): return False
  chars.pop(0)
  
  # return true if of the form lldll or llldll
  if len(chars) == 0: return True
  
  # check third letter after the digit
  if not chars[0].isalpha(): return False
  chars.pop(0)
  
  # return true if of the form lldlll or llldlll
  if len(chars) == 0: return True
  
  # if there is more input, then it's not a valid userid
  return False

Source: uvauserid1.py

Automata

Let’s phrase this using a finite state machine

Each arrow assumes that it pops (consumes) one character
Double circled states are final states
- If the end of input is in a final state, the string is accepted

As code

def check_uva_userid2(what):
  # returns True if the passed parameter is a valid UVA userid, else False

  state_table = [ 
    # from each state, where to go on an 'l', 'd', and 'o', respectively
    [], # we numbered our states from 1, so we burn spot 0
    [2,    None, None], # state 1 goes to 2 on a letter
    [3,    None, None], # state 2 goes to 3 on a letter
    [4,    5,    None], # state 3 goes to 4 on a letter and 5 on a digit
    [None, 5,    None], # state 4 goes to 5 on a digit
    [6,    None, None], # state 5 goes to 6 on a letter
    [7,    None, None], # state 6 goes to 7 on a letter
    [8,    None, None], # state 7 goes to 8 on a letter
    [None, None, None], # no transitions out of state 8
  ]
  final_states = [3, 4, 6, 7, 8]

  chars = list(what.lower())
  state = 1
  while len(chars) > 0:

    # which is whether it's a letter (0), digit (1), or other (2)
    which = 0 if chars[0].isalpha() else 1 if chars[0].isdigit() else 2

    # get the state transition, and verify it's not None
    next_state = state_table[state][which]
    if next_state is None: return False

    # transition to that state and pop the input character
    state = next_state
    chars.pop(0)

  # we should have ended in a final state
  return state in final_states

Source: uvauserid2.py

Automata

What if we only have one accepting (aka final) state?

But which way to go?
- In state 2, on a input of a letter, we could go to two different states: 3 or 8 (likewise for states 3, 5, and 6)
This is called non-determinism

Automata

The same automata as the last slide, but laid out differently

Why non-determinism

Easier to model many “things”
- Regular expressions, for example
All non-deterministic finite automata (NFAs) can be converted into deterministic finite automata (DFAs)
- And they have to be for our (deterministic) computers to use them
Sometimes converting an polynomial-time NFA yields an exponential time DFA

Automata

This has empty (\(\epsilon\)) transitions; only allowed on NFAs

Automata definitions

DFA: Deterministic Finite Automata
- From any state, on any input, there is at most one destination state
NFA: Non-deterministic Finite Automata
- There is at least one state, which on some input, has more than one possible destination state
- and/or there is at least one empty transition
The range of languages (strings) that these can produce (aka accept) are called regular languages

Automata and languages

How would you write an automata that can accept strings with some positive number of a’s followed by some positive number of b’s?
- Expressed as \(a^+b^+\)

More powerful automata

How would you write an automata that can accept strings with some positive number of a’s followed by the same number of b’s?
- Expressed as \(a^nb^n\)

Answer: you can’t. Any finite automata of \(n\) states cannot accept a string of \(n+1\) a’s (and \(n+1\) b’s)

Accepting \(a^nb^n\)

To accept these strings, you need memory: how many a’s have been accepted
This is done via a stack
- On each a, push a value onto the stack
- On each b, pop a value from the stack
Ensure the stack is empty at the end
This is called a push-down automata
The range of languages (strings) that these can produce (aka accept) are called context-free languages

Even more powerful automata

What about \(a^nb^nc^n\)?
A push-down automata (from the last slide) can’t handle this
You need something called a linear-bounded automata
- An automata with a limited form of memory
The range of languages (strings) that these can produce (aka accept) are called context-sensitive languages

Yet even more powerful automata

What about a programming language?
This needs a Turing machine, which can compute anything a computer can compute
- Such languages are called recursively enumerable languages

Turing Machines

A Turing machine has a NFA (or DFA) as it’s “control”
- A computer’s current memory state is, in effect, a DFA state
And an infinite tape that it can read and write values to
- Analogous to a computer’s memory
(image from texexample.net)

NFA Animation

Back to our NFA to accept UVA userids
Imagine that this is the NFA in a Turing machine
- Although this one does not use memory
We’ll trace through accepting any valid UVA userid
Goal: what states can the NFA be in at any step?

NFA Animation

What state is the NFA in?

Step	\(s_1\)	\(s_2\)	\(s_3\)	\(s_4\)	\(s_5\)	\(s_6\)	\(s_7\)	\(s_8\)
1	true	false	false	false	false	false	false	false
2	false	true	false	false	false	false	false	false
3	false	false	true	false	false	false	false	true
4	false	false	false	true	true	false	false	true
5	false	false	false	false	true	true	false	true
6	false	false	false	false	false	true	true	true
7	false	false	false	false	false	false	true	true
8	false	false	false	false	false	false	false	true

What state is the NFA in?

Step	\(s_1\)	\(s_2\)	\(s_3\)	\(s_4\)	\(s_5\)	\(s_6\)	\(s_7\)	\(s_8\)
1	\(s_1\)	\(\neg s_2\)	\(\neg s_3\)	\(\neg s_4\)	\(\neg s_5\)	\(\neg s_6\)	\(\neg s_7\)	\(\neg s_8\)
2	\(\neg s_1\)	\(s_2\)	\(\neg s_3\)	\(\neg s_4\)	\(\neg s_5\)	\(\neg s_6\)	\(\neg s_7\)	\(\neg s_8\)
3	\(\neg s_1\)	\(\neg s_2\)	\(s_3\)	\(\neg s_4\)	\(\neg s_5\)	\(\neg s_6\)	\(\neg s_7\)	\(s_8\)
4	\(\neg s_1\)	\(\neg s_2\)	\(\neg s_3\)	\(s_4\)	\(s_5\)	\(\neg s_6\)	\(\neg s_7\)	\(s_8\)
5	\(\neg s_1\)	\(\neg s_2\)	\(\neg s_3\)	\(\neg s_4\)	\(s_5\)	\(s_6\)	\(\neg s_7\)	\(s_8\)
6	\(\neg s_1\)	\(\neg s_2\)	\(\neg s_3\)	\(\neg s_4\)	\(\neg s_5\)	\(s_6\)	\(s_7\)	\(s_8\)
7	\(\neg s_1\)	\(\neg s_2\)	\(\neg s_3\)	\(\neg s_4\)	\(\neg s_5\)	\(\neg s_6\)	\(s_7\)	\(s_8\)
8	\(\neg s_1\)	\(\neg s_2\)	\(\neg s_3\)	\(\neg s_4\)	\(\neg s_5\)	\(\neg s_6\)	\(\neg s_7\)	\(s_8\)

Adding in step numbers (the tableu)

Step	\(s_1\)	\(s_2\)	\(s_3\)	\(s_4\)	\(s_5\)	\(s_6\)	\(s_7\)	\(s_8\)
1	\(s_{11}\)	\(\neg s_{12}\)	\(\neg s_{13}\)	\(\neg s_{14}\)	\(\neg s_{15}\)	\(\neg s_{16}\)	\(\neg s_{17}\)	\(\neg s_{18}\)
2	\(\neg s_{21}\)	\(s_{22}\)	\(\neg s_{23}\)	\(\neg s_{24}\)	\(\neg s_{25}\)	\(\neg s_{26}\)	\(\neg s_{27}\)	\(\neg s_{28}\)
3	\(\neg s_{31}\)	\(\neg s_{32}\)	\(s_{33}\)	\(\neg s_{34}\)	\(\neg s_{35}\)	\(\neg s_{36}\)	\(\neg s_{37}\)	\(s_{38}\)
4	\(\neg s_{41}\)	\(\neg s_{42}\)	\(\neg s_{43}\)	\(s_{44}\)	\(s_{45}\)	\(\neg s_{46}\)	\(\neg s_{47}\)	\(s_{48}\)
5	\(\neg s_{51}\)	\(\neg s_{52}\)	\(\neg s_{53}\)	\(\neg s_{54}\)	\(s_{55}\)	\(s_{56}\)	\(\neg s_{57}\)	\(s_{58}\)
6	\(\neg s_{61}\)	\(\neg s_{62}\)	\(\neg s_{63}\)	\(\neg s_{64}\)	\(\neg s_{65}\)	\(s_{66}\)	\(s_{67}\)	\(s_{68}\)
7	\(\neg s_{71}\)	\(\neg s_{72}\)	\(\neg s_{73}\)	\(\neg s_{74}\)	\(\neg s_{75}\)	\(\neg s_{76}\)	\(s_{77}\)	\(s_{78}\)
8	\(\neg s_{81}\)	\(\neg s_{82}\)	\(\neg s_{83}\)	\(\neg s_{84}\)	\(\neg s_{85}\)	\(\neg s_{86}\)	\(\neg s_{87}\)	\(s_{88}\)

Re-arranging the states

Step	\(s_1\)	\(s_2\)	\(s_3\)	\(s_4\)	\(s_5\)	\(s_6\)	\(s_7\)	\(s_8\)
1	\(s_{11}\)	\(\neg s_{12}\)	\(\neg s_{13}\)	\(\neg s_{14}\)	\(\neg s_{15}\)	\(\neg s_{16}\)	\(\neg s_{17}\)	\(\neg s_{18}\)
2	\(s_{22}\)	\(\neg s_{21}\)	\(\neg s_{23}\)	\(\neg s_{24}\)	\(\neg s_{25}\)	\(\neg s_{26}\)	\(\neg s_{27}\)	\(\neg s_{28}\)
3	\(s_{33}\)	\(s_{38}\)	\(\neg s_{31}\)	\(\neg s_{32}\)	\(\neg s_{34}\)	\(\neg s_{35}\)	\(\neg s_{36}\)	\(\neg s_{37}\)
4	\(s_{44}\)	\(s_{45}\)	\(s_{48}\)	\(\neg s_{41}\)	\(\neg s_{42}\)	\(\neg s_{43}\)	\(\neg s_{46}\)	\(\neg s_{47}\)
5	\(s_{55}\)	\(s_{56}\)	\(s_{58}\)	\(\neg s_{51}\)	\(\neg s_{52}\)	\(\neg s_{53}\)	\(\neg s_{54}\)	\(\neg s_{57}\)
6	\(s_{66}\)	\(s_{67}\)	\(s_{68}\)	\(\neg s_{61}\)	\(\neg s_{62}\)	\(\neg s_{63}\)	\(\neg s_{64}\)	\(\neg s_{65}\)
7	\(s_{77}\)	\(s_{78}\)	\(\neg s_{71}\)	\(\neg s_{72}\)	\(\neg s_{73}\)	\(\neg s_{74}\)	\(\neg s_{75}\)	\(\neg s_{76}\)
8	\(s_{88}\)	\(\neg s_{81}\)	\(\neg s_{82}\)	\(\neg s_{83}\)	\(\neg s_{84}\)	\(\neg s_{85}\)	\(\neg s_{86}\)	\(\neg s_{87}\)

Rephrasing the last slide

\({\color{blue}s_{11}} \wedge \neg s_{12} \wedge \neg s_{13} \wedge \neg s_{14} \wedge \neg s_{15} \wedge \neg s_{16} \wedge \neg s_{17} \wedge \neg s_{18}\)
\({\color{blue}s_{22}} \wedge \neg s_{21} \wedge \neg s_{23} \wedge \neg s_{24} \wedge \neg s_{25} \wedge \neg s_{26} \wedge \neg s_{27} \wedge \neg s_{28}\)
\(({\color{blue}s_{33}} \vee {\color{blue}s_{38}}) \wedge \neg s_{31} \wedge \neg s_{32} \wedge \neg s_{34} \wedge \neg s_{35} \wedge \neg s_{36} \wedge \neg s_{37}\)
\(({\color{blue}s_{44}} \vee {\color{blue}s_{45}} \vee {\color{blue}s_{48}}) \wedge \neg s_{41} \wedge \neg s_{42} \wedge \neg s_{43} \wedge \neg s_{46} \wedge \neg s_{47}\)
\(({\color{blue}s_{55}} \vee {\color{blue}s_{56}} \vee {\color{blue}s_{58}}) \wedge \neg s_{51} \wedge \neg s_{52} \wedge \neg s_{53} \wedge \neg s_{54} \wedge \neg s_{57}\)
\(({\color{blue}s_{66}} \vee {\color{blue}s_{67}} \vee {\color{blue}s_{68}}) \wedge \neg s_{61} \wedge \neg s_{62} \wedge \neg s_{63} \wedge \neg s_{64} \wedge \neg s_{65}\)
\(({\color{blue}s_{77}} \vee {\color{blue}s_{78}}) \wedge \neg s_{71} \wedge \neg s_{72} \wedge \neg s_{73} \wedge \neg s_{74} \wedge \neg s_{75} \wedge \neg s_{76}\)
\({\color{blue}s_{88}} \wedge \neg s_{81} \wedge \neg s_{82} \wedge \neg s_{83} \wedge \neg s_{84} \wedge \neg s_{85} \wedge \neg s_{86} \wedge \neg s_{87}\)

A final formula

\[ \begin{array}{l} accepted = \\ {\color{blue}s_{11}} \wedge \neg s_{12} \wedge \neg s_{13} \wedge \neg s_{14} \wedge \neg s_{15} \wedge \neg s_{16} \wedge \neg s_{17} \wedge \neg s_{18} \wedge \\ {\color{blue}s_{22}} \wedge \neg s_{21} \wedge \neg s_{23} \wedge \neg s_{24} \wedge \neg s_{25} \wedge \neg s_{26} \wedge \neg s_{27} \wedge \neg s_{28} \wedge \\ ({\color{blue}s_{33}} \vee {\color{blue}s_{38}}) \wedge \neg s_{31} \wedge \neg s_{32} \wedge \neg s_{34} \wedge \neg s_{35} \wedge \neg s_{36} \wedge \neg s_{37} \wedge \\ ({\color{blue}s_{44}} \vee {\color{blue}s_{45}} \vee {\color{blue}s_{48}}) \wedge \neg s_{41} \wedge \neg s_{42} \wedge \neg s_{43} \wedge \neg s_{46} \wedge \neg s_{47} \wedge \\ ({\color{blue}s_{55}} \vee {\color{blue}s_{56}} \vee {\color{blue}s_{58}}) \wedge \neg s_{51} \wedge \neg s_{52} \wedge \neg s_{53} \wedge \neg s_{54} \wedge \neg s_{57} \wedge \\ ({\color{blue}s_{66}} \vee {\color{blue}s_{67}} \vee {\color{blue}s_{68}}) \wedge \neg s_{61} \wedge \neg s_{62} \wedge \neg s_{63} \wedge \neg s_{64} \wedge \neg s_{65} \wedge \\ ({\color{blue}s_{77}} \vee {\color{blue}s_{78}}) \wedge \neg s_{71} \wedge \neg s_{72} \wedge \neg s_{73} \wedge \neg s_{74} \wedge \neg s_{75} \wedge \neg s_{76} \wedge \\ {\color{blue}s_{88}} \wedge \neg s_{81} \wedge \neg s_{82} \wedge \neg s_{83} \wedge \neg s_{84} \wedge \neg s_{85} \wedge \neg s_{86} \wedge \neg s_{87} \\ \end{array} \]

That’s satisfiability!

Cook-Levin Theorem (1971)

Any such problem
- …of the type we have seen…
… that can be solved by a Turing machine…
- And thus an automata (DFA or NFA), push-down automata, etc.
… can be reduced to satisfiability

Consider this…

Consider a non-deterministic finite automata
- And an associated input
There is one path that, on the associated input, will yield a final state
- But there are many other possible paths one could traverse with the same input
- Exponentially many other paths
We have:
- A polynomial time algorithm if we allow non-deterministic movement through the automata
- An exponential time algorithm if we allow only deterministic movement through the automata
  - As we have to try each of the (expoentially many) possible paths

Complexity classes

P: Any problem that can be solved by a deterministic finite automata (DFA) in polynomial time
- Solution can be computed and also verified in (deterministic) polynomial time
NP: Any problem that can be solved by a non-deterministic finite automata (NFA) in polynomial time
- Might take exponential time on a DFA
- Solution can be verified in (deterministic) polynomial time
If a problem can be solved by a DFA in polynomial time, then it can be solved by an NFA in polynomial time
- Thus, \({\color{forestgreen}P} \subset {\color{blue}NP}\)

Complexity classes

NP-hard
- A problem that is at least as difficult as NP
  - Could be more difficult, but not less difficult
There are believed to be problems in NP that are not NP-hard (none yet proven)
- These are called NP-Intermediate

There are actually many more complexity classes:

Recap: Satisfiability (SAT)

No efficient solution to SAT has yet been found
- Maybe one exists, though…
We know it’s in NP
- As we can verify a solution in polynomial time
What if we want to show that some problem \(X\) is just as difficult as SAT?
We would want to show that both:
- \(X\) reduces to SAT: \(X \le_p SAT\)
- SAT reduces to \(X\): \(SAT \le_p X\)
That would mean they are (roughly) equivalent in difficulty
- And that likely no efficient solution can be found

Complexity classes

NP-complete
- A set of problems that are of equivalent difficulty as SAT
We show problem \(X\) is NP-complete by:
- Reducing SAT (or another NP-complete problem) to \(X\)
- Reducing \(X\) to SAT (or another NP-complete problem)
  - But if \(X \in {\color{blue}NP}\), then it reduces to SAT via the Cook-Levin theorem
  - So we just have to show \(X \in {\color{blue}NP}\)

Showing \(X \in NP\)

Via a proof we won’t show here, it has been shown that:
- A problem \(X\) being in NP
  - Meaning solvable by a NFA in polynomial time
- And an solution to \(X\) being able to be verified in polynomial time
Are actually the same thing

So we just have to show that we can verify a problem’s solution in polynomial time to show it is in NP

Example reduction: Vertex Cover

Consider the minimum vertex cover problem:
- Given a graph \(G=(V,E)\), find a set of vertices \(C\subseteq V\) such that every edge in \(E\) has at least one endpoint in \(C\)
To prove it’s NP-complete:
- Show it’s in NP (this means it reduces to SAT)
  - \(VC \in {\color{blue}NP}\) means that \(VC \le_p SAT\)
- Reduce a known NP-complete problem to it
  - \(SAT \le_p VC\)

Example reduction: Vertex Cover

Show it’s in NP (this means it reduces to SAT)
- Given a set of vertices \(C\), we can, in polynomial time, verify all edges in \(E\) have at least one endpoint in \(C\)
  - We just check each edge has at least one end point in \(C\) which takes \(\Theta(E)\) time
Reduce another NP-complete problem to it
- Maximum Independent Set is NP-complete
- So we reduce Maximum Independent Set to Minimum Vertex Cover
- (shown a few slides ago: just take the complement graph)
Thus, the vertex cover problem is NP-complete

Counter-intuitiveness

Two things tend to be counter-intuitive when trying to prove that problem \(X\) is NP-complete:

You are not solving the problem, or giving a direct algorithm to solve it
- You are only showing that it reduces to and from another NP-complete problem (for which you may not know a algorithm for)
You reduce another NP-complete problem to problem \(X\)
- Recall that we want to show both:
  - \(X \le_p SAT\) (or another NP-complete instead of SAT)
  - \(SAT \le_p X\) (or another NP-complete instead of SAT)
- But the first one is already proven (assuming you show it’s in NP)
- So you only have to show the second of those \((SAT \le_p X)\)

Why we can’t just reduce \(X\) to SAT

Consider the problem of finding a bipartite matching (not necessarily maximal)
Reduce it to SAT:
- Create a set of clauses for the possible matchings: \(bloomfied \wedge husky\), \(pettit \wedge labrador\), etc.
  - But also ensure one dog per person:
    \(bloomfied \wedge husky \wedge \neg labrador \wedge \neg dachshund \wedge \neg jonangi\)
  - Likewise ensure one human per dog
- OR all these clauses together
- Negate, via DeMorgan’s Law, into conjunctive normal form
Solve using SAT
Solution: DeMorgan-ize it again out of conjunctive normal form, and examine the clauses with the correct pairings

Why we can’t just reduce \(X\) to SAT

What did we just show?

That non-maximal bipartite matching can be reduced to SAT
But non-maximal bipartite matching is much easier than SAT!
- A non-maximal matching can be done in \(O(V)\) time: go through the left nodes, and assign them to an arbitrary right node that is not yet assigned
This doesn’t show that non-maximal bipartite matching is as difficult as SAT
- Only that SAT is more difficult than non-maximal bipartite matching

Can SAT be solved in polynomial time?

Nobody has found an efficient solution
- But that doesn’t mean one doesn’t exist
It has not yet been proven that one can not solve it in polynomial time
- Just that nobody has yet figured out a way to do so
If it can’t be solved in polynomial time, then \({\color{forestgreen}P} \ne {\color{blue}NP}\)
If it can be solved in polynomial time, then \({\color{forestgreen}P} = {\color{blue}NP}\)
- This would be bad – all modern encryption would be crackable
If you can prove this, one way or the other, you get a million dollar prize