Workflow Motifs for Finding Frequent Patterns in Distributed Traces
A graph-based abstraction for mining frequently-repeating patterns from distributed traces to aid performance diagnosis.
Diagnosing performance problems in distributed applications is hindered by a fundamental mismatch: developers have rich, powerful abstractions for building complex systems, but engineers diagnosing those same systems are left with primitive tools that operate on raw logs with little higher-level structure. This project introduces the workflow motif — a formally defined abstraction representing frequently-repeating processing patterns mined from the distributed traces of request executions. Each motif is a subgraph that appears frequently across a collection of traces, annotated with performance characteristics such as critical-path latency distributions and edge-latency distributions, and organized hierarchically so engineers can explore application behavior at multiple levels of detail. As a proof of concept, we applied an early version of the system to HDFS traces and surfaced a concrete performance bottleneck: HDFS was reading blocks in 64KB serial chunks and synchronously flushing each to the network, causing low disk and network utilization.
We paused this line of work after finding that off-the-shelf frequent-subgraph mining algorithms were too expensive for practical use at the scale of real distributed application traces.
👤 Members
Mania Abdi
Darby Huye
Mark Crovella
Peter Desnoyers