Workflow Motifs for Finding Frequent Patterns in Distributed Traces

Diagnosing performance problems in distributed applications is hindered by a fundamental mismatch: developers have rich, powerful abstractions for building complex systems, but engineers diagnosing those same systems are left with primitive tools that operate on raw logs with little higher-level structure. This project introduces the workflow motif — a formally defined abstraction representing frequently-repeating processing patterns mined from the distributed traces of request executions. Each motif is a subgraph that appears frequently across a collection of traces, annotated with performance characteristics such as critical-path latency distributions and edge-latency distributions, and organized hierarchically so engineers can explore application behavior at multiple levels of detail. As a proof of concept, we applied an early version of the system to HDFS traces and surfaced a concrete performance bottleneck: HDFS was reading blocks in 64KB serial chunks and synchronously flushing each to the network, causing low disk and network utilization.

We paused this line of work after finding that off-the-shelf frequent-subgraph mining algorithms were too expensive for practical use at the scale of real distributed application traces.

📄 Related Publications

2025

arXiv
The workflow motif: a widely-useful performance diagnosis abstraction for distributed applications

Mania Abdi, Peter Desnoyers, Mark Crovella, and 1 more author

May 2025

Abs Bib PDF

Diagnosing problems in deployed distributed applications continues to grow more challenging. A significant reason is the extreme mismatch between the powerful abstractions developers have available to build increasingly complex distributed applications versus the simple ones engineers have available to diagnose problems in them. To help, we present a novel abstraction, the workflow motif, instantiations of which represent characteristics of frequently-repeating patterns within and among request executions. We argue that workflow motifs will benefit many diagnosis tasks, formally define them, and use this definition to identify which frequent-subgraph-mining algorithms are good starting points for mining workflow motifs. We conclude by using an early version of workflow motifs to suggest performance-optimization points in HDFS.
@techreport{Abdi2025, author = {Abdi, Mania and Desnoyers, Peter and Crovella, Mark and Sambasivan, Raja R.}, title = {The workflow motif: a widely-useful performance diagnosis abstraction for distributed applications}, number = {arXiv:2506.00749 [cs.DC]}, publisher = {Cornell University}, address = {Ithaca, NY, USA}, pages = {}, year = {2025}, month = may, doi = {https://doi.org/10.48550/arXiv.2506.00749}, }

Workflow Motifs for Finding Frequent Patterns in Distributed Traces

👤 Members

Mania Abdi

Darby Huye

Mark Crovella

Peter Desnoyers

Raja Sambasivan

📄 Related Publications

2025