Group Home

RAMP Home

People

Publications

Software

Funding


   

Our research on reliability-aware microprocessors focuses on both hard errors (lifetime reliability) and soft errors.

Lifetime reliability:

Relentless technology scaling coupled with increasing power densities is threatening the nearly unlimited processor lifetime reliability standards that computer users have come to expect. It has been predicted that in the future, product cost and performance requirements will be substantially affected, and in many cases, superseded by constraints brought on by dwindling lifetime reliability due to wear-out mechanisms like electromigration and gate-oxide breakdown. We observe that the microarchitecture's ability to track application behavior can potentially be leveraged for reliability benefit. In this direction, we have (1) developed RAMP, the first microarchitectural model for lifetime reliability, (2) incorporated technology scaling models into RAMP to quantify the impact of scaling on lifetime reliability, and (3) are developing a range of architectural techniques for reliability enhancement and management.

Soft errors:

Soft errors due to high energy particle strikes are a growing concern for processor reliability. Recent work has motivated architecture-level studies of soft errors since the architecture can mask many raw errors and architectural solutions can exploit workload knowledge. We have proposed a model and tool, called SoftArch, to enable analysis of soft errors at the architecture level in modern processors. SoftArch is based on a probabilistic model of the error generation and propagation process in a processor. Compared to prior architecture-level tools, SoftArch is more comprehensive or faster.

This work is in collaboration with Pradip Bose and Jude Rivers from IBM.