Static analysis is a foundational technique in software verification that examines source code to identify potential issues without executing the program. It leverages pre-defined rules, often targeting a fixed set of general safety properties, to detect flaws that might lead to errors at runtime. Unlike dynamic analysis, which evaluates software during execution, static analysis operates at compile time and uses symbolic or generic inputs.

graph LR
    A((Source<br>code S)) --> C[Analyzer A] --> D((✓ Safe))
    B((Implicit<br>Property)) --> C --> E((✘ Unsafe))

The process is entirely automatic. A static analyzer processes the source code and implicitly checks its compliance with specific properties. The outcomes are categorized as either:

  • Safe: No violations of the specified properties were found, suggesting that the code adheres to the predefined safety rules.
  • Unsafe: The analyzer has flagged potential violations, indicating areas in the code where issues might occur under certain conditions.

General Properties Checked During Static Analysis

Static analysis focuses on ensuring the safety and correctness of the software by identifying common programming issues. Examples of properties typically analyzed include:

  1. No integer overflows: Ensures that arithmetic operations on integer variables remain within their valid range.
  2. No type errors: Detects mismatches between expected and actual data types in operations.
  3. No null-pointer dereferencing: Flags scenarios where a null pointer might be dereferenced, potentially leading to runtime crashes.
  4. No out-of-bound array accesses: Verifies that all array indices used in the program are within the valid range.
  5. No race conditions: Ensures proper synchronization in multi-threaded programs to prevent simultaneous access to shared resources.
  6. No useless assignments: Detects assignments where a variable is overwritten without being used.
  7. No usage of undefined variables: Flags variables that are used without being initialized, preventing unpredictable behavior.

Comparing Static and Dynamic Analysis

While both static and dynamic analysis aim to improve software reliability, they differ significantly in approach and application:

AspectStatic AnalysisDynamic Analysis
TimingConducted at compile time, before the program is executed.Conducted at runtime, during the execution of the program.
OperationWorks directly on the source code or another model of the software.Examines the software’s actual behavior in a live environment.
ExecutionOperates without running the software, using symbolic or generic inputs.Requires specific inputs to test the program’s responses and functionality.
InferenceInfers properties of the software’s dynamic behavior by analyzing the code structure, logic, and annotations.Detects real-time issues like memory leaks, incorrect outputs, and timing errors.
Example PropertiesArray bounds, type safety, and race conditions.Memory leaks, incorrect outputs, and timing errors.

What Does Static Analysis Imply for Runtime Behavior?

Static analysis aims to infer properties about a program’s dynamic behavior without running it. However, this leads to two important implications:

Important

  1. Over-approximation: Static analysis may flag issues that do not occur in practice (false positives). For example, it might consider an edge case that cannot actually arise during program execution due to logical constraints elsewhere in the code.
  2. Under-approximation: Conversely, static analysis might miss certain issues (false negatives) because it cannot account for every possible runtime scenario, especially when behavior depends on specific dynamic conditions like user input or system state.

This means the properties verified by static analysis are not guaranteed to hold during actual execution but provide a strong indication of potential problem areas. Combining static and dynamic analysis often offers a more comprehensive verification approach, addressing the limitations of each method.

Static Analysis and Program Behavior

When we talk about program behavior, we are referring to the full range of possible executions of a program, represented as sequences of states the program can go through during its execution. Each state corresponds to a specific configuration of variables, memory, and program flow at a particular moment in time.

Static analysis works by analyzing the source code to identify potential errors or unsafe states that could occur during execution. It examines the program without actually running it, trying to predict where things might go wrong. However, one of the limitations of static analysis is that while it can identify erroneous states in the code, it cannot always guarantee that the program will ever reach these states during actual execution.

This leads to a characteristic of static analysis known as pessimism: it tends to be overly cautious by reporting possible errors that may never actually affect the running program. This cautious approach is necessary to ensure the analysis is sound, but it can also lead to false positives, where the analysis suggests problems that do not exist in practice.

Precision and Efficiency Trade-Off

One of the key challenges in static analysis is the precision/efficiency trade-off. To make static analysis sound (i.e., ensuring that if an error is reported, it truly exists in the code), analysts often rely on over-approximations of the program’s behavior. These approximations aim to cover all possible execution paths but may not capture all the fine details of the program’s behavior, which introduces the following challenges:

  1. Precision: The more precise the analysis, the more accurately it can model the program’s behavior, but this comes at the cost of increased computational complexity. Perfect precision in static analysis is often impossible due to the undecidability of certain problems in computer science. Some aspects of program behavior are inherently complex and cannot be fully captured by any static analysis technique.
  2. Efficiency: Achieving high precision can be computationally expensive. Analyzing every possible path, checking all conditions, and tracking every variable could require significant resources, especially for large and complex programs. As a result, it is often impractical to aim for perfect precision.
  3. Trade-Off Between Precision and False Positives: If we opt for low precision to improve efficiency, the analysis will be faster but more likely to produce false positives—cases where the analysis flags an error that doesn’t actually exist in practice. These false positives require manual verification, which can defeat the purpose of using static analysis to save time and effort.

Therefore, designing a static analysis technique involves finding a balance between precision and efficiency. The goal is to achieve a level of precision that is sufficient to catch real errors while ensuring that the analysis can be performed within a reasonable time and resource constraint. In practice, this often means using techniques that approximate the program’s behavior while being careful not to generate an unmanageable number of false positives that require excessive manual intervention.