LegoSDN

LegoSDN logo.

LegoSDN is a fault-tolerant controller framework that allows SDN controllers to isolate and tolerate SDN application failures. The framework isolates applications using sandboxes, uses checkpoint and restore techniques to recover crashed applications, safely rolls back the changes made to the network by the crashed SDN application, and transforms the crash-inducing input to help the crashed SDN application recover even from deterministic failures.

Read the paper »

Highlights

Comparison of recovery times of different techniques.

Recovery Time

Recovery Time is defined as the time elapsed between the SDN application crash and the instant at which the SDN application successfully re-processes the crash-inducing input.

LegoSDN is 3x faster than controller reboots in recovering SDN applications from failure. Improvements to the CRIU checkpoint-and-restore library used can contribute further reductions in recovery times.

Demonstration of reovery using event transformations.

Event Transforms

Event tranforms refer to the transformation rules that change or transform a crash-inducing input to another equivalent input, which conveys the same intent as the original input to the SDN application. Event transformations are key to recovering from deterministic failures.

In case of deterministic failures, only LegoSDN recovers the application quickly (~250 ms) without losing application state and without losing any inputs. Even though Controller Reboots also help, they are extremely slow; the entire stack (including other healthy applications) are rebooted on failure.

Recovery of SDN application state from checkpoint.

Stateful Recovery

Stateful recovery refers to recovery of SDN application state upon restart after a crash. SDN applications are, typically, stateful and loss of state on recovery can be problematic.

LegoSDN uses checkpoint-and-restore techniques to safely recover an SDN application's state after a crash. Consequently, in the case of a crash of a stateful firewall, flows are able to make progress only when LegoSDN assists in crash recovery. Loss of state with other crash recovery techniques impede flows from making any progress after a crash.


Talks

@SOSR '16

We presented a prototype implementation of LegoSDN at the ACM Symposium on SDN Research (SOSR) held in Santa Clara, CA on March 14 & 15, 2016.

@HotNets '14

We presented our initial position paper on redesig of SDN controller architectures at ACM Hot Topics in Networks (HotNets) held in Los Angeles, CA on October 27 & 28, 2014.


Source Code

LegoSDN's source code is hosted on Github. The prototype implementation runs on top of Floodlight, and, hence, requires Floodlight to build and run LegoSDN. Support for checkpoint-and-restore requires the Checkpoint and Restore In Userspace (CRIU) library. Detailed instructions on how to compile the source code, configure and run SDN applications with LegoSDN are included in the README.

$ git clone https://github.com/balakrishnanc/legosdn.git
$ cd legosdn
# Instructions on how to compile and run LegoSDN are in README.md

People

Balakrishnan Chandrasekaran

Bala Chandrasekaran

Bala is a sixth year PhD candidate in the Computer Science Department.

Personal Web Page

Brendan Tschaen

Brendan Tschaen

Brendan is a second year PhD student in the Computer Science Department.

Personal Web Page

Theophilus Benson

Theophilus Benson

Theo is an Assistant Professor in the Computer Science Department.

Personal Web Page