7.3 - Data-flow Analysis

A Control-Flow Graph (CFG) is a directed graph that represents all possible execution paths through a program. It is an essential tool in static analysis, particularly for understanding how control flows between different parts of a program.

Nodes in the CFG: Each node corresponds to a program statement or block of statements.
Edges in the CFG: An edge represents a possible execution path from one statement to another.
Declarations: These are ignored in CFGs as they do not change the program’s execution state or flow. For instance, variable declarations are skipped because they do not affect conditional logic or data flow.

CFGs are widely used in compiler optimization, debugging, and various forms of static analysis, including data-flow analysis and error detection.

Example: CFG for a Program
Let’s consider the following code snippet:
1: x := 1
2: y := x + 2
3: if (y > 3)
4:    z := y
5: else
6:    z := x
7: y := x
Explanation:

The program starts at statement 1, initializing x to 1.

Statement 2 assigns the value x + 2 to y.

At statement 3, a conditional statement (if y > 3) determines the next path:

If the condition is true, the program executes statement 4.

Otherwise, it executes statement 6.

Both paths converge at statement 7, which assigns x to y.

Corresponding CFG: The control flow for the above program can be visualized as:
graph LR
    1 --> 2 --> 3 --> 4 --> 7
    3 --> 6 --> 7
Each block in the graph represents a line of code.

The conditional statement at line 3 creates two branches in the graph.

Data-Flow Analysis

Data-flow analysis is a static analysis technique that operates on the CFG of a program. Its purpose is to extract information about how data is defined, used, and propagated throughout the program.

Definitions: Places in the code where a variable gets a new value (e.g., x := 1).
Uses: Places where a variable’s value is read (e.g., y := x + 2 reads x).
Analyzer Output: For each line in the program, data-flow analysis can determine the set of variables that are live (used later in the program) or dead (no longer needed).

Example
A data-flow analyzer might produce an output like the following:
Line 1: Live variables = {x, y}
Line 2: Live variables = {z}
Line 3: Live variables = {y, z}
...
Line n: Live variables = {}

One important property derived from data-flow analysis is live variable analysis, which determines whether variables are unnecessarily assigned values. For example, a variable is considered dead if its value is assigned but never used in subsequent operations. Identifying such variables helps optimize code by eliminating useless assignments.

Live Variables and Their Analysis

In program analysis, the concept of live variables is fundamental for optimizing code and ensuring efficient execution. A variable is considered live at a specific point in the program if its value may be used on a subsequent execution path before being redefined. Live variable analysis is performed using the Control-Flow Graph (CFG), where the program’s execution paths are represented as directed graphs.

Definition

A variable is defined as live at the exit of a block in the CFG if there exists a path from to a point in the program where is used, and along this path, is not redefined.

The goal of live variable analysis is to determine, for every block in the CFG, which variables are live at the exit of that block. For each block , we compute , which represents the set of variables that are live at .

The process involves:

Traversing the CFG in reverse order (from outputs back to inputs) to analyze how variables propagate through the program.
Marking variables as live when they are used and removing them when they are redefined.

Example
For example, consider the following program and its CFG representation:
1: x := 1
2: y := x + 2
3: if (y > 3)
4:    z := y
5: else
6:    z := x
7: y := x
Corresponding CFG:

Analysis Output:

These results indicate that, for example, at line 1, only x is live at the exit, while at line 3, both x and y are live.

Live variable analysis uses an over-approximation approach, meaning that the calculated set at each block is a superset of the actual live variables. This implies:

If , the variable is definitely not live at .
If , may or may not be live at , as it might only be live along certain paths.

Applications of Live Variable Analysis

One of the primary applications of live variable analysis is dead assignment elimination, a crucial optimization technique. If a variable is not live after it is defined, any assignment to is unnecessary and can be removed without affecting the program’s behavior.

Example
For example, consider the following program:
1: x := 4       # Useless assignment (x not live here)
2: x := 7       # Potentially useful (x may be live here)
3: if (z > y)
4:    y := x
Here, the assignment at line 1 can be eliminated because x is not live after it. On the other hand, the assignment at line 2 is retained since x may be used later.

Dead assignment elimination works by identifying any block that:

Defines or redefines a variable .
Does not require to be live at the exit of ().

Automating Live Variable Analysis

Modern compilers and analysis tools automate this process. Live variable analysis can be reduced to solving a system of equations, which can then be tackled using standard algorithms like:

Fixed-point algorithms: Iteratively propagate liveness information through the CFG until no further changes occur.
Worklist algorithms: Dynamically track and update only the parts of the CFG affected by each computation step, improving efficiency.

Reaching Definitions and Their Analysis

Reaching definitions is a fundamental concept in data-flow analysis, used to identify which variable assignments in a program influence subsequent blocks of code. A definition refers to an assignment to a variable in a specific block of a program. A definition is said to reach another block if there exists a path in the Control-Flow Graph (CFG) from block to block such that the variable is not redefined along that path.

Example
Consider the following program, which computes a factorial-like result:
1: x := 5
2: y := 1
3: while (x > 1)
4:    y := x * y
5:    x := x - 1
The CFG for this program is as follows:
graph TD
    1[x := 5] --> 2[y := 1] --> 3[[while x > 1]] --> 4[y := x * y] --> 5[x := x - 1] --> 3
In this program, we are interested in understanding which definitions of variables and reach the entry of block . For the first iteration of the loop:

Definitions and reach block .

For subsequent iterations:

Definitions and reach block .

Goal

The goal of reaching definitions analysis is to determine, for every block in the CFG, which variable definitions may reach that block. This is achieved by computing:

: The set of definitions that reach the entry of block .

: The set of definitions that reach the exit of block .

The results for the example program are:

This analysis uses over-approximation, meaning that the computed sets may include definitions that are not strictly necessary. Specifically:

If , the definition of at definitely does not reach .
If , the definition of at may or may not reach , as it might only apply along certain execution paths.

Reaching definitions analysis is formalized using equations that describe how definitions propagate through blocks:

where represents all predecessor blocks of . Here:

: The set of definitions introduced (or generated) in block .
: The set of definitions invalidated (or killed) by block , typically other definitions of the same variables redefined at .

Step-by-Step Analysis of the Example
We compute the reaching definitions for each block:
1: x := 5
2: y := 1
3: while (x > 1)
4:    y := x * y
5:    x := x - 1
Block 1:

(implicit initialization)

.

Block 2:

.

.

Block 3:

.

(loop header; no new definitions yet).

Block 4:

.

.

Block 5:

.

.

Iteratively solving these equations, we find that definitions such as and coexist in the analysis output due to the loop’s cyclic nature, highlighting the over-approximation.

Reaching definitions analysis has several practical applications in compiler optimization and program analysis:

Dead Code Elimination: If a definition of a variable does not reach any subsequent use, it is dead and can be safely removed to optimize the program.
Constant Propagation: Identifying definitions that are constant throughout a program’s execution allows the compiler to replace variables with their values, reducing runtime computations.
Code Refactoring: This analysis can identify redundant or unnecessary variable assignments, aiding in cleaner and more maintainable code.
Error Detection: Reaching definitions analysis can help identify uninitialized variable uses, which might lead to runtime errors.

Use-Def (UD) and Def-Use (DU) Chains

Use-def (UD) chains and def-use (DU) chains are core concepts in data-flow analysis, linking variable definitions to their uses and vice versa. These chains play a critical role in program optimization and error detection. They provide insights into which statements define values that other statements use (def-use), and vice versa, which statements depend on specific definitions (use-def). Such information is invaluable for tasks like parallelization, optimization, and correctness verification.

Program Optimization:
- Identifying independent definitions and uses allows safe parallelization of computations.
- Eliminating redundant or unnecessary definitions improves efficiency.
Error Detection: Highlighting uses without prior definitions can prevent runtime errors caused by uninitialized variables.
Code Maintenance: Tracing the origins of variable values and their dependencies makes debugging and refactoring easier.

Example Program for UD and DU Chains
Consider the following program:

The corresponding Control-Flow Graph (CFG) is as follows:
graph TD
    1[x := 0] --> 2[x := 3] --> 3[if z == x] --> 4[z := 0] --> 7[y := x]
    3 --> 6[z := x] --> 7
    7 --> 8[x := y + z]
Use-Def (UD) Chains

A Use-Def (UD) chain connects a use of a variable to all definitions that may reach it. It identifies which assignments could have influenced a variable’s value at the point of its usage.

For example, in the above program:

Variable x at block 7 uses a value that could have been defined at block 2. Thus, the UD chain for x at block 7 is:

This result means the definition of x at block 2 (i.e., x := 3) reaches block 7 where x is used.

Def-Use (DU) Chains

A Def-Use (DU) chain connects a definition of a variable to all its uses that may be affected by it. It helps track how a definition propagates through the program.

For example:

Variable x defined at block 2 can potentially influence:

Block 3 (z == x)

Block 6 (z := x)

Block 7 (y := x)

Block 8 (x := y + z)

Thus, the DU chain for x at block 2 is:

Formal Definition of UD Chains

Definition

A UD chain can be computed using the following formula:

Where:

indicates an assignment to at line .

is true if there exists a definition-clear path between (the definition) and (the use), meaning is not redefined along this path.

Example Analysis

In the given program, we can trace the definitions and uses of x:

Definition at Block 1 (x := 0):

This definition does not reach any later use because it is immediately redefined at block 2.

Definition at Block 2 (x := 3):

This definition reaches blocks 3, 6, 7, and 8 as shown in the DU chain example.

Use at Block 7 (y := x):

The UD chain for this block is UD(x, 7) = {2} because the definition of x at block 2 reaches this use without being redefined.

UD Chains Derived from Reaching Definitions

The computation of Use-Def (UD) chains can leverage the results of a reaching definitions analysis. In this context, a reaching definition specifies which definitions of a variable may reach a specific program point. From these definitions, we identify the relevant definitions for each use of a variable in the program.

Definition

The formula for computing , the set of all definitions of that may reach its use in block , is given as:

Here:

represents the set of reaching definitions at the entry of block .

is the program block where a definition of is found.

Example Program: Computing
Consider the following program:
1: x := 0
2: x := 3
3: if (z == x)
4:    z := 0
5: else
6:    z := x
7: y := x
8: x := y + z
The Control-Flow Graph (CFG) is represented as:
graph TD
    1[x := 0] -.-> 2[x := 3] -.-> 3[if z == x] -.-> 4[z := 0] -.-> 7[y := x]
    3 -.-> 6[z := x] -.-> 7
    7 -.-> 8[x := y + z]
The following table summarizes the reaching definitions and corresponding for each block:

Block Reaching Definitions () UD Chains
1
2
3
4
6
7
8

DU Chains and DU Pairs

While UD chains connect uses to the definitions that may reach them, Def-Use (DU) chains take the opposite perspective. A DU chain links a definition of a variable to all uses that it may reach. This chain can be computed as the inverse of UD chains.

Definition

The formula for computing , the set of all uses that a definition of at block may reach, is:

A def-use pair for a variable exists if:

defines ,

uses , and

The definition of at reaches .

Example Analysis of DU Chains
Using the same program and CFG, the following DU chains and def-use pairs can be identified:

Block (Definition) DU Chains Def-Use Pairs
2:
4:
6:
7:

Visualization of DU and UD Chains

The relationships between UD and DU chains can be visualized as follows:
graph TD
    1[x := 0] -.-> 2[x := 3] -.-> 3[if z == x] -.-> 4[z := 0] -.-> 7[y := x]
    3 -.-> 6[z := x] -.-> 7
    7 -.-> 8[x := y + z]

    2 -- DU(x, 2) --> 3
    2 -- DU(x, 2) --> 6
    2 -- DU(x, 2) --> 7

	7 -- UD(x, 7) --> 2
	6 -- UD(x, 6) --> 2
	3 -- UD(x, 3) --> 2

Example

Consider the following fragment of code in C language:

int foo() {
	x = input();
	while (x > 0) {
		y = 2 * x;
		if (x > 10)
			y = x - 1;
		else
		x = x + 2;
		x = x - 1;
	}
	x = x - 1;
	return x;
}

You have to accomplish the following:

draw the control flow graph of the program;
apply the live variable analysis
point out a potential issue with this code that live variable analysis would be able to spot;
provide the def-use pairs for variables x and y;

The Control Flow Graph for the given program fragment can be represented as:

graph TD
    0[Start] --> 1[x = input]
    1 --> 2[while x > 0] --> 10[x = x - 1] --> 11[return x]
    2 --> 3[y = 2 * x]
    3 --> 4[if x > 10]
    4 --> 5[y = x - 1] --> 8[x = x - 1]
    4 --> 6[x = x + 2] --> 8 --> 2

Live variable analysis

Live variable analysis tracks which variables are “live” at each point in the program. A variable is live if its current value is potentially used later. Here’s the step-by-step computation:

Initialization: Assume that variables are not live before they are defined.
Backward Analysis: Starting from the return point, propagate liveness information back through the CFG.

Line	Live Variables	Reason
11	∅	No variables are live after returning .
10		is used in the return statement at line 11.
8		is used in the condition at line 2, so it remains live.
6, 7		is used in the condition at line 2, and these lines redefine .
5		is used in the condition at line 2; is defined here but is not used afterward.
4		is used in the condition at line 2.
3		Same as above; is defined but not live.
2		is used in subsequent lines.
1		is live after being input and used in the condition at line 2.

Dead Code: The variable is defined at lines 3 and 5 but never used in the program. This indicates dead code, as these computations have no effect on the program’s behavior. Live variable analysis helps identify such unused definitions.

DU pairs – Reaching definitions

The analysis focuses on identifying where variables are defined (written) and used (read). For each variable, we determine the sets of definitions that reach its uses:

	Definition	Usage
	Lines 1, 7, 8, 10	Lines 2, 3, 4, 5, 7, 8, 10, 11
	Lines 3, 5	None

UD chains identification

UD Chains for :

Use of	Definitions Reaching the Use
Line 2	{1, 8}	{1, 8}
Line 3	{1, 8}	{1, 8}
Line 4	{1, 8}	{1, 8}
Line 5	{1, 8}	{1, 8}
Line 7	{1, 8}	{1, 8}
Line 8	{1, 7, 8}	{1, 7, 8}
Line 10	{1, 8}	{1, 8}
Line 11	{10}	{10}

Def-Use Pairs for :

Definition of	Uses Reached	Def-Use Pairs
Line 1	{2, 3, 4, 5, 7, 8}
Line 7	{8}
Line 8	{2, 3, 4, 5, 7, 8}
Line 10	{11}

UD Chains for :

Use of	Definitions Reaching the Use
No uses	N/A	N/A

Def-Use Pairs for :

Since is not used in the program, there are no def-use pairs for .

References

Carlo A. Furia. Material for the Software Analysis course.

Polimi CS - Notes

Explorer

7.3 - Data-flow Analysis

Table of Contents

Data-Flow Analysis

Live Variables and Their Analysis

Applications of Live Variable Analysis

Automating Live Variable Analysis

Reaching Definitions and Their Analysis

Use-Def (UD) and Def-Use (DU) Chains

Use-Def (UD) Chains

Def-Use (DU) Chains

Formal Definition of UD Chains

UD Chains Derived from Reaching Definitions

DU Chains and DU Pairs

Visualization of DU and UD Chains

Example

Live variable analysis

DU pairs – Reaching definitions

UD chains identification

References

Backlinks

Graph View

Table of Contents