Search
 
Home| Contact Us| Join Our Mailing List| New Journals| Browse Journals| Journal Prices| For Authors| Advanced Search
Bookmark and Share
HOME > JOURNALS BY SUBJECT > COMPUTER SCIENCE > PPL
Parallel Processing Letters (PPL)
Current Issue | 2011 | 2010 | 2009 | All Volumes (1991-2011)

Volume: 18, Issue: 3(2008) pp. 371-390     DOI: 10.1142/S0129626408003454
Abstract | Full Text (PDF, 1,241KB) | References
Title: METADATA RANKING AND PRUNING FOR FAILURE DETECTION IN GRIDS
This work is supported in part by the European Union under projects CoreGRID (# IST-2002-004265) and EGEE (#IST-2003-508833). A Preliminary version of this paper has appeared in [32] and [33]. The second author was supported by a CoreGRID REP Fellowship during 2008.
Author(s):
DEMETRIOS ZEINALIPOUR-YAZTI
Pure and Applied Science, Open University of Cyprus, 1304, Nicosia, Cyprus

HARRIS PAPADAKIS
Foundation of Research and Technology - Hellas, Institute of Computer Science, Heraklion, Crete, Greece

CHRYSSIS GEORGIOU
Department of Computer Science, University of Cyprus, 1678, Nicosia, Cyprus

MARIOS D. DIKAIAKOS
Department of Computer Science, University of Cyprus, 1678, Nicosia, Cyprus
History:
Received May 2008
Revised July 2008
Abstract:
The objective of Grid computing is to make processing power as accessible and easy to use as electricity and water. The last decade has seen an unprecedented growth in Grid infrastructures which nowadays enables large-scale deployment of applications in the scientific computation domain. One of the main challenges in realizing the full potential of Grids is making these systems dependable.

In this paper we present FailRank, a novel framework for integrating and ranking information sources that characterize failures in a grid system. After the failing sites have been ranked, these can be eliminated from the job scheduling resource pool yielding in that way a more predictable, dependable and adaptive infrastructure. We also present the tools we developed towards evaluating the FailRank framework. In particular, we present the FailBase Repository which is a 38GB corpus of state information that characterizes the EGEE Grid for one month in 2007. Such a corpus paves the way for the community to systematically uncover new, previously unknown patterns and rules between the multitudes of parameters that can contribute to failures in a Grid environment. Additionally, we present an experimental evaluation study of the FailRank system over 30 days which shows that our framework identifies failures in 93% of the cases and can achieve this by only fetching 65% of the available information sources. We believe that our work constitutes another important step towards realizing adaptive Grid computing systems.
Keywords:
Data Ranking Algorithms; Computational Grids; Failures; Scheduling

Imperial College Press  |  Global Publishing  |  Asia-Pacific Biotech News  |  Innovation Magazine  |  Asia Pacific Mathematics Newsletter
Labcreations Co  |  Meeting Matters  |  National Academies Press

World Scientific is a Member of CrossRef

Copyright © 2012 World Scientific Publishing Co. All rights reserved.