Skip to content

Static Program Analysis

Book: https://users-cs.au.dk/amoeller/spa/

Applications

  • Optimization
  • Correctness
  • Development

The input/behavior of nontrivial programs are undecidable.

The Tiny Imperative Language

  • Syntax
  • Normalization
  • AST
  • CFG

Type Analysis

rs
Type -> int
      | ↑Type
      | (Type, ..., Type) -> Type

      // Infinite types (caused by recursion)
      | μ TypeVar. Type
      | TypeVar

      // Record
      | { Id: Type, ..., Id: Type }
      | ◊  // Absent field

TypeVar -> t | u | ...

Polymorphic + recursive = undecidable type analysis.

Limits:

  • flow-sensitive
  • polymorphic
  • ignores other kinds of runtime errors

Lattice Theory

Partially ordered set

  • Reflexive
  • Transitive
  • Antisymmetric

Lattice

  • Least upper bound (join)
  • Greatest lower bound (meet)

Complete lattice: every subset has a join and a meet.

  • Top
  • Bottom

Examples:

  • Powerset lattice (P(S),)
  • Flat lattice flat(A)
  • Product lattice L1×L2
  • Function lattice L1L2

Homomorphism:

  • Homomorphism: f:L1L2 such that f(xy)=f(x)f(y)f(xy)=f(x)f(y)
  • Isomorphism = bijective homomorphism

lift(L): L with a new bottom element

Equation System:

fixed point, least fixed point (LFP)

For monotonic function f:LL, LFP(f)=i=0fi()

Dataflow Analysis with Monotone Frameworks

Sign analysis, Constant propagation analysis:

  • JOIN(v)=wpred(v)[[w]]
  • X=E: [[v]]=JOIN(v)[Xeval(JOIN(v),E)]
  • var X: [[v]]=JOIN(v)[X]

Live variable analysis:

  • JOIN(v)=wsucc(v)[[w]]
  • X=E: [[v]]=JOIN(v){X}vars(E)
  • if (E), output E: [[v]]=JOIN(v)vars(E)
  • var X: [[v]]=JOIN(v){X}
  • exit: [[v]]=

Available expressions analysis:

  • State = (P({exprs}), )
  • JOIN(v)=wpred(v)[[w]]
  • X=E: [[v]]=JOIN(v){E} removed all expressions that contain X
  • if (E), output E: [[v]]=JOIN(v){E}

Very busy expressions analysis:

  • JOIN(v)=wsucc(v)[[w]]
  • X=E: [[v]]=JOIN(v) removed all expressions that contain X{E}
  • Usage: "Code hoisting"

Reaching definitions analysis:

  • State = (P({definitions}), )
  • JOIN(v)=wpred(v)[[w]]
  • X=E: [[v]]=JOIN(v){X}{v}
  • Usage: "DCE", "Code motion"

Summary

ForwardBackward
MayReaching definitionsLive variables
MustAvailable expressionsVery busy expressions

May&Must

Transfer function: [[v]]=tv(JOIN(v))

Widening and Narrowing

  • Allowing lattices with infinite height to converge.
  • Accelerate finite height lattices.
  1. Widening: B
  2. Narrowing: Not guaranteed to converge. Heuristics must determine how many iterations to apply.

Widening operator: :L×LL:

  • Only apply widening at recursive dataflow constraints.

Path Sensitivity

Assertions: assert(E)

  • Trivial: [[assert(E)]]=JOIN(v)
  • Predicates: [[assert(E>0)]]=JOIN(v)[Xgt(JOIN(v)(X),eval(JOIN(v),E))]
  • Sound & Monotone

Relations

  • L=PathL
  • flag=0: [[v]]=[flag=0pPathJOIN(v)(p),flag0]
  • assert(flag): [[v]]=[flag=0,flag0JOIN(v)(p)(flag0)]

Interprocedural Analysis

Context Sensitivity

Using the lattice (Contextlift(State))n. The bot of lift(State) is "unreachable".

  • Context=Singleton: Context-insensitive
  • Context=Callk: Call string approach
  • Context=State: Functional approach: full context sensitivity

Distributive Analysis Frameworks

Possibly-Uninitialized Variables Analysis

  • Monotone framework: not scalable
  • Pre-analysis + Main analysis:
    • Pre-analysis:
      • Lattice: (lift(P(Var)P(Var)))n
      • X=E: [[v]]1=tX=Ewpred(v)[[w]]1 or unreachable
    • Main analysis
      • Lattice: (lift(P(Var)))n
      • Non-entry node: [[v]]2=[[v]]1([[v]]2) or unreachable

Compat Representation of Distributive Functions

  1. f:P(D)P(D) which is distributive, and D is a finite set.
    • Used in: "Pre-analysis" above, the IFDS framework
    • A special case of 2.

{}{y|yf()}{xy|xf({x})yf()}

txt
·  d1  d2  d3  d4
|\          \  |
| \          \ |
|  \          \|
·  d1  d2  d3  d4
  1. f:(DL)(DL) which is distributive and D is a finite set and L is a complete lattice.
    • Used in: the IDE framework

Define g:(D{})×(D{})(LL)

g(d1,d2)(e)=f([d1e])(d2)g(,d2)(e)=f()(d2)g(,)(e)=eg(d1,)(e)=

The IFDS Framework

Interprocedural, Finite, Distributive, Subset

  • Lattice: P(D) of a finite set D
  • Transfer functions: tv:P(D)P(D) are distributive

Phase 1

  • v0,d0v1,d1Pv1,d1v2,d2Ev0,d0v2,d2P

Phase 2

  • d1,d2:v0,d1v,d2Pd2[[v]], where v0 is the entry node of the function containing v.

The IDE Framework

Interprocedural, Distributive, Environment

  • Lattice: DL, where D is a finite set and L is a complete lattice (the "environment")
  • Transfer functions: tv:(DL)(DL) are distributive

Phase 1

  • m1=[[v0,d1v,d2]]Pm2=[v,d2v,d3]Em2m1[[v0,d1v,d3]]P

Phase 2

  • d0,d:[[v0,d0]]unreachablem=[[v0,d0v,d]]Pm([[v0,d0]])[[v,d]]

Properties of the IFDS and IDE Frameworks

  • Produces meet-over-all-paths solutions
  • The worst-case time complexity is: O(|E||D|3)

Control Flow Analysis

Closure analysis

  • λX[[λX.E]]
  • E1 E2: λX[[E1]]([[E2]][[X]][[E]][[E1E2]]) for every λX.F

First-class Functions

Pointer Analysis

  • Allocation-site abstraction

  • Interprocedural pointer analysis

  • Null pointer analysis

  • Flow-sensitive pointer analysis

  • Escape analysis

Abstract Interpretation

TODO