Contributions

NetTrails makes the following contributions:

Distributed Provenance Maintenance and Querying

We present ExSPAN, a scalable framework for achieving network provenance [NetDB08, SIGMOD10] in a distributed environment. ExSPAN utilizes declarative networking techniques and rewrite rules to efficiently affix provenance information to tuples communicated between nodes. ExSPAN significantly reduces communication overhead by distributing provenance information among nodes, and appending only short provenance pointers to tuples to identify the nodes that maintain the relevant provenance information. Provenance queries are evaluated by performing recursive traversal of the provenance graph in a distributed fashion. We show that several optimization techniques are available for further reduce the overhead of provenance querying.

Provenance in Dynamic Environments

To enable consistent and complete provenance query results in highly-dynamic networks, we propose the Time-aware provenance (TAP) [TaPP11] model, which contains an additional temporal dimension that enables time-travelling in the provenance graph. In addition, the enhanced model explicitly supports provenance of state changes by attributing each state change to a previously occurred change and the existences of other states at that time. Aware of the large maintenance overhead introduced by the additional temporal dimension, we explore alternative replay-based provenance maintenance techniques with different performance tradeoffs, and further discuss their applicability in workload with different characteristics.

Secure Provenance in Adversarial Environments

Getting correct answers to provenance queries is difficult in an adversarial setting because compromised nodes can fabricate plausible (but incorrect) responses to conceal their misbehavior. We propose a secure network provenance model (SNP) [NSDI11 Poster, SOSP11] that is made possible by adopting the tamper-evident logs and replay-based auditing in a complete untrusted environment where an unknown subset of nodes is controlled by a Byzantine adversary. Our results show that SNP can be easily applied to diverse network protocols and systems. Our SNP implementation, incurs low processing, bandwidth, and latency overheads, while enabling tamper-evident provenance queries for any system state when applied to interdomain routing (BGP), the Chord DHT, and MapReduce executions in Hadoop.

Interactive Exploration Toolkit

We develop a visualization toolkit [SIGMOD11 Demo] that allows interactive exploration of system state and the corresponding provenance graph. We plan to further enhance the toolkit by adding the support for the TAP and SNP enhancements that enable the toolkit to be applied to a wider range of applications and scenarios.