OSDI 2008 - Day 2 - Session 2
Binary Translation Using Peephole Superoptimizers
Sorav Bansal and Alex Aiken, Stanford University
Zhenyu Guo, Microsoft Research Asia; Xi Wang, Tsinghua University; Jian Tang and Xuezheng Liu, Microsoft Research Asia; Zhilei Xu, Tsinghua University; Ming Wu, Microsoft Research Asia; M. Frans Kaashoek, MIT CSAIL; Zheng Zhang, Microsoft Research Asia
Cristian Cadar, Daniel Dunbar, and Dawson Engler, Stanford University
KLEE shares many ideas with modern model checking techniques based on symbolic execution. In my opinion, KLEE doesn’t bring any important contribution in the area of bug discovery. All the techniques they used for optimizing the contraint solving (expression rewriting, propagation of constants, counterexample caching, implied value concretization, extracting independent constraints, etc.) are already employed by modern contraint solvers (e.g., SAT solvers), and all modern model checkers have some powerful constraint solver as backend.
They don’t provide any intuition of why the state scheduling strategies that KLEE uses (random path selection and coverage-optimized search), the metrics they use in these strategies (e.g., minimum distance to an uncovered instruction and call stack of a state) and the way they switch between strategies (round-robin) are efficient in practice. I suppose the minimum distance to an uncovered instruction is useful for a fast convergence to a high coverage of the tests. I suppose they use the call stack information to find tests having as few nested function calls as possible, and possibly to also bound the recursive calls. We can only speculate, since they don’t provide any clear intuition behind these heuristics they use. However, I believe these heuristics are only useful for converging faster to a good coverage. I don’t know how well they will work for bug discovery. They may harm the bug discovery, instead of helping it. In my opinion, finding a bug efficiently doesn’t have so much to do with the coverage of the symbolic execution, but rather with the amount of relevant states explored. They have to employ other heuristics for optimizing the bug discovery.
The programs they used to test KLEE have about 1KLOC on average (452 applications totaling 430 KLOC). Applications of this size don’t constitute a challenge for modern model checkers — they can easily handle 10KLOC applications. They also don’t compare KLEE to existing test generators and model checkers, on the same test C programs.
An important missing feature is the ability to symbolically execute multi-threaded programs. The state explosion is even worse for multi-threaded programs, compared to sequential programs (i.e., factorial vs. exponential blowup). There are already model checkers that can handle multi-threaded programs.
The only real contribution of KLEE seems to be the environment modeling. KLEE is able to execute programs that are environmentally-intensive. The environmnent is represented by command line arguments, file data, network packets, etc. All variables which denote inputs (e.g., command line arguments) are symbolic variables — they can have any initial value. Files whose names are stored by symbolic variables become symbolic files; the other files are concrete files. KLEE handles in an interesting way the symbolic files, by implementing a symbolic file system that efficiently simulates all I/O operations performed on that file. The I/O operations are executed natively on concrete files.
The native execution of I/O operations will constitute a bottleneck for programs that do heavy I/O on concrete files in certain states and revisit those states very often. I would therefore renounce at the notion of concrete file — I would consider all files symbolic. Moreover, it is not clear how the I/O operations on concrete files are isolated while symbolically executing different paths. Mechanisms like copy-on-write (or some sort of versioning and rollback) should be used on top of the I/O operations — they cannot be just executed natively.