Text Box:     

Outline of a Proposal for a Data-intensive

Computational GRID Testbed

Proposal for an outline of a project to be submitted to the EU and the NSF

Kors Bos[1], Franηois Etienne[2], Enrique Fernandez[3], Fabrizio Gagliardi[4], Hans Hoffmann4, Mirco Mazzucato[5], Robin Middleton[6], Les Robertson4, Federico Ruggieri5, Gyorgy Vesztergombi[7]

Version 1.3       24 February 2000

Executive Summary

This paper is a preliminary proposal for a project funded by the European Commission (EC) to develop and demonstrate  an effective wide-area distributed computing fabric. This will include the development of an appropriate architecture and the necessary middleware, leveraging existing computational Grid[8] technology, extending it to handle very large data sets, and demonstrating it with scientific applications in a production quality environment.

Requirement

CERN, the European Organisation for Nuclear Research, funded by 20 European nations, is constructing a new particle accelerator on the Swiss-French border on the outskirts of Geneva. When it begins operation in 2005, this machine, the Large Hadron Collider (LHC), will be the most powerful machine of its type in the world, providing research facilities for several thousand High Energy Physics (HEP) researchers from all over the world.

The computing capacity required for analysing the data generated by the experiments at the LHC machine will be several orders of magnitude greater than that used by current experiments at CERN. It will therefore be necessary to use in an integrated way computing facilities installed at several HEP sites distributed across Europe, North America and Asia.

During the last two years the MONARC[9] project, supported by a number of institutes participating in the LHC programme, has been developing and evaluating different models for LHC computing. MONARC has also developed tools for simulating the behaviour of such models when implemented in a wide-area distributed computing environment.

The Opportunity

Last autumn the European Commission (EC) and the US National Science Foundation (NSF) sponsored a workshop to examine the possibility of European-US collaboration on R&D for large scientific databases. In November Professor G. Metakides made a visit to CERN during which, in discussion with the HEPCCC[10], he encouraged the submission of proposals to the EU informatics funding programmes for R&D projects for large scale distributed data and computing facilities, sometimes referred to as Computing Grids. Subsequently, at a meeting in Brussels in January 2000, HEP was invited to prepare the present paper, outlining a project that would address both of these areas. The paper has been prepared by some of the potential partners in such a project. A formal proposal would include industrial partners and additional members from HEP and other sciences.

The Project Proposal

The project will focus on three areas that are not covered by other Grid related projects:

·         the management of very large amounts of distributed  data (the LHC experiments will together generate many PetaBytes of data each year);

·         high throughput computing – emphasising capacity rather than performance (HEP computing is characterised by the need to process very large numbers of independent transactions);

·         automated management of both local computing fabrics, and the wide-area GRID.

The project will implement a comprehensive set of tools and facilities (middleware) to achieve its goals in these areas, basing this on an existing framework developed by previous GRID-related projects. 

These tools will be demonstrated on a Testbed comprising a facility at CERN interconnected with computing facilities installed in institutes in several European countries. The HEP community will provide the primary applications used on the testbed, as part of its preparation for LHC. Real users will use the testbed for real tasks. The project will therefore incur a considerable risk but at the same time will benefit from serious commitment and considerable resources from the participating consortium partners.

The partners who have already committed to the project are CERN and HEP institutes representing France, Hungary, Italy, Spain and the UK. These core partners are working at both technical and managerial levels on this draft proposal. Other national physics institutes are expected to join in the project.

Contacts have been established with other sciences (biology, meteorology, space) and industry (PC manufactures, a mass storage supplier, and software houses specialising in the development of middleware for distributed systems[11]).

A US component is under discussion.

The Resources

Each of the national institutes will provide resources and funding for the establishment of the national/regional part of the GRID.

High performance bandwidth between the sites involved is assumed to be provided by other EU supported initiatives such as Gιant. The project proposal will define these requirements in terms of bandwidth and network services. The current cost estimated for networking between the sites contemplated in this proposal is an average of €5 M per year, with minimum target bandwidths of 622 Mbps for four larger European sites and the US link, 155 Mbps for four smaller European sites and 2.5 Gbits/sec for CERN. The project can exploit substantially greater bandwidths, in the multi-Gigabit range, if they can be provided.

EU financial support will be sought for the development of the middleware, the overall integration and operation of the GRID, the management of the project and the demonstrators, and support for the exchange of personnel and the organization of workshops.

The overall coordination and support of the project will be at CERN.

The initial duration is three years. After this time an assessment will be made and an extension of the project for another two years will be considered.

Initial Estimate of the Financial Envelope

European Union component:

People and software:                           €30 M     (€18 M funded by EU, €12 M funded by partners)

Material and overheads:                     €30 M     (€12 M funded by EU, €18 M funded by partners)

Total funding request to the EU:       €30 M

Networking:                                          €15 M     funded through Gιant and/or other
                                                          initiatives

 

US component:                                                    funded by NSF

Background

In September 1998 representatives from the EU and the US National Sciences Foundation met in Budapest where they signed an agreement to foster scientific collaboration between European and American sciences. As part of this agreement, a task force, formed by leading European and American scientists, was formed to propose to the EU and the NSF actions to implement the collaboration.

These actions were discussed in several workshops and in particular at the workshop on large scientific databases and archives in Annapolis in September 1999. Scientists from computer science, astronomy, biology, physics and earth observation attended this workshop. The workshop concluded that a concerted action from the EU and the USA should support the establishment of a worldwide distributed computing environment with a few large centres hosting large multi-Petabyte data bases and archives. The processing power should be distributed across these major centres and the other sites. A high performance network interconnect should support easy Web based access to processing capacity and data. Important new scientific results could be achieved by merging large and different databases and archives. This work programme would leverage the pioneering work and results of the several meta-computing projects carried out during the last ten years. 

In November 1999 senior EU officials visited CERN to discuss the possible participation of CERN and High Energy Physics in general in the R&D programmes of the European Commission. During the visit they met with the High Energy Physics Computing Coordination Committee (HEPCCC), whose membership includes directors and senior managers of High Energy Physics institutes in the CERN member states, and discussed the results and proposals of the EU-US workshop and other potential opportunities for informatics R&D. A good match was found between the ongoing planning activity for the computing of the LHC and the goals put forward by the EU-US workshop.

At a follow up meeting in Brussels among representatives of the EU-US task force and EU senior officials it was agreed to submit to the EU a project proposal by mid May. A proposal covering the American part of the project should also be submitted to the NSF. It is anticipated that both funding agencies will jointly review and approve the project.

Computing Requirements for LHC Experiments

There are several characteristics of experimental HEP code and applications that are important in designing computing facilities for HEP data processing[12].

·         In general the computing problem consists of processing very large numbers of independent transactions, which may therefore be processed in parallel - the granularity of the parallelism can be selected freely;

·         modest floating point requirement - computational requirements are enormous but best expressed in SPECint rather than SPECfp units;

·         massive data storage: measured in PetaBytes (1015 Bytes) for each experiment;

·         read-mostly data, rarely modified, usually simply completely replaced when new versions are generated;

·         high sustained throughput is more important than peak speed - the performance measure is the time it takes to complete processing for all of the independent transactions;

·         resilience of the overall (grid) system in the presence of sub-system failures is far more important than trying to ensure 100% availability of all sub-systems at all times.

There is therefore a significant contrast between the characteristics of HEP computing and those of classical super-computing, where floating point performance and fine-grained parallelism are significant factors. HEP applications need High Throughput Computing rather than High Performance Computing.

Summary of the Requirements

There will initially be four large experiments at LHC, two with rather similar computing requirements (ATLAS and CMS), one with a much higher data recording bandwidth (ALICE) and the fourth (LHCB) with rather lower computing needs. Each of these experiments requires a large computing and data storage fabric at CERN, which must be integrated with a number of regional computing centres with smaller, but still significant, capacity installed in laboratories closer to the home institutes of the researchers. The following table summarises the current estimated sizing of the facilities at CERN and in the regional centres in the year 2006 (the first year in which the LHC machine will produce data at the design intensity) together with the growth expected to be required in subsequent years.

Estimated computing resources required at CERN for LHC experiments in 2006

 

collaboration

ALICE

ATLAS

CMS

LHCB

Total

CPU capacity (SPECint95[13])

2006

600,000

600,000

600,000

300,000

2,100,000

annual inc.

200,000

200,000

200,000

100,000

700,000

estimated # cpus in 2006

 

3,000

3,000

3,000

1,500

10,500

disk capacity (TB)

2006

300

550

550

200

1,600

annual inc.

90

200

200

70

560

mag. tape capacity (PB)

2006

3.7

3.4

3.4

1.4

11.9

annual inc.

2.0

2.0

2.0

0.9

6.9

aggregate I/O rates (GB/sec)

disk

100

100

100

40

340

 

tape

1.2

0.4

0.4

0.2

2.2

 

Regional Centre Model

There are a number of reasons for adopting a model in which the computing resources for each LHC experiment will be distributed between the central facility at CERN, five or six large regional computing centres, and several smaller centres. The final stage of interactive analysis will normally be performed on a powerful cluster installed in the physics department of the individual researcher. Bringing computing facilities geographically closer to the home institutes will normally enable a higher network bandwidth to be used between the private cluster and the regional centre than could be obtained from the longer distance connection to CERN. This is particularly important in the case of institutes in America and Asia. The second motivation for regional centres is more mundane: it will be possible to obtain funding for national and regional computing installations which would not be available for equipment and services installed centrally at CERN. Another factor is that the devolution of control over resources allows scheduling decisions to be made giving priority to the physics interests of national or regional groups within the experimental collaboration.

The overall computing model encompassing the regional and CERN centres is currently under study within the context of the MONARC project, and so detailed requirements are not yet available for the distribution of data between the centres, the way in which the data will be synchronised, and the inter-centre network bandwidth. In any case, the details of the model will evolve with time, particularly in the period just after the LHC machine begins operation and real data becomes available. However, several basic characteristics can be assumed, and in particular the requirement to maintain a homogeneous user view of the complete computing resource. It will be necessary to develop a general framework to support data transfer between centres and global workload scheduling which is flexible and can be adapted to a range of possible models.

The following table summarises the resource requirements for one of the major regional centres for one experiment. There will be about five centres of this size and a number of smaller centres for each experiment.

Estimated computing resources for an LHC Regional Computing Centre (2006)

 

 

 

CPU capacity

120,000

  SI95

disk capacity

110

  Tbytes

disk I/O rate

20

  GBytes/sec

tape storage

0.4

  PetaBytes

WAN connection

2.5

  Gbits/sec

 

The Project Testbed

A testbed is required in order to validate the models and software developed by the R&D programme, evaluate different potential hardware and software technologies, and to demonstrate the feasibility of building a production facility prior to the beginning of the equipment acquisition process in 2004. The testbed must include CERN and several remote sites (regional centres), in Europe and in the United States. The CERN site should develop in capacity over a period of three years to reach by the end of 2002 the scale of about 50% of the size required by one of the large LHC experiments. The remote sites should have capacities of at least 20% of that, with links to CERN of at least 622 Mbps by 2003. Smaller remote sites could have lower bandwidth connections. The aggregate bandwidth required at CERN for 2003 is a minimum of 2.5 Gbps.


 

Capacity targets for the Testbed at CERN

 

units

end 2000

end 2001

end 2002

CPU capacity

SI95

20,000

70,000

300,000

estd. number of cpus

 

400

1,000

3,000

disk capacity

TBytes

20

60

250

disk I/O rate

GBytes/sec

5

15

50

tape storage – capacity

PetaBytes

0.2

0.3

1.0

            –  sustained data rate

Mbytes/sec

250

500

1,000

WAN links to external sites

Mbits/sec

155

622

2,500

 

In addition to providing a demonstration environment, the testbed must be operated as a production facility for one or more real applications with real data. This is the only way to ensure that the output of the R&D programme in terms of models, principles and software fulfils the basic requirements for performance and reliability and has practical application to real-world computing.

HEP needs the testbed in order to validate the R&D programme. During the next three years it also has requirements for substantial computing capacity for current experiments, but these requirements could be fulfilled with a much smaller facility - about 20% of the size of the testbed. Therefore CERN is looking for partners with similar long term computing requirements for high throughput computing, to take part in the R&D programme and to make use of the testbed computing capacity.

R&D Required

There are three base factors that drive the R&D programme:

·         adaptability: the computing facility must be very flexible, capable of being used for several different applications, supporting data models which will develop and change significantly in the light of the experience of the first years of data collection and analysis, and able to evolve smoothly to provide increased capacity and exploit new technologies;

·         scalability: the need to scale computation capacity (hundreds of thousands of SPECint95s, thousands of processors), disk capacity (hundreds of TeraBytes, thousands of hard disks), local network bandwidth (Terabits per second of throughput), and tertiary storage capacity (tens of PetaBytes of automated magnetic tape storage);

·         wide-area distribution of resources: the need to operate an integrated computing facility implemented as a number of geographically separated computer centres installed in different countries and continents, owned by independent research agencies, and providing computational and data storage services to end-users and smaller computing installations.

The Work Programme

Work packages will be defined for the national and international Grid activities. For each work package a rough estimate of resources, duration and measurable deliverables will be given. This chapter will be developed using the results of the work being done in the technical task force of this initiative.

The R&D programme is divided into six work packages:

1.       computing fabric management: the management of the hardware and basic software required for the local computing fabric - the processors, disks, tape drives and robots, the equipment providing the local area and storage networks, operating system installation and maintenance, file systems, storage area networks, gateways to external networking facilities, monitoring consoles, power distribution, etc.;

2.       mass storage management: the software required to manage the mass storage system - secondary and tertiary storage space management, migration/recall of data between tertiary and secondary storage, provision of classes of service to schedule access to the same data from processes with different priority levels, data interchange between sites (export/import);

3.       wide area data management: universal name space, efficient data transfer between sites, synchronisation of remote copies, wide-area data access/caching, interfacing to mass storage management systems;

4.        wide-area workload management: the scheduling of work submitted by users, including local and wide-area load balancing, scheduling support taking account of relative priority levels, data location, automated parallelisation of work, <span STYLE="font-weight: medium">the application program interfaces (APIs) and graphical user interfaces (GUIs) which provide user access to the distributed computing facility: the issues here include finding the correct level for this interface (e.g. above or below the object database), and the degree of transparency</span> (to what extent must the application be aware of the topology of the facility).

5.        wide-area application monitoring: the instrumentation of applications, and tools to analyse dynamically the performance of an application in a GRID environment; automatic recognition of application-level performance and resource problems; automatic re-configuration of the application’s resources and re-scheduling of the application components.

6.        application development: the seamless integration of existing end-user applications with Grid middleware and the development of new, grid-aware applications is an important component of any testbed and pivotal to the success of the project. There will be a series of independent work packages covering the contributions of the different scientific and industrial areas.

Related R&D is required in the area of wide-area network management, but this is outside the scope of this programme.

There will also be work package defined to cover the implementation and exploitation of the testbed.

From the CERN Testbed to the HEP GRID

The need for a large test bed for the design and implementation of the LHC computing and the regional centre model have recently inspired a great interest in the HEP community for the GRID metaphor. This metaphor developed on the basis of the pioneering activity by several Metacomputing projects mostly in USA and described in the recent book by Foster and Kesselman, is a perfect match for the overall worldwide computing model for LHC. This has been underlined by Ian Foster at a recent DoE meeting in Washington where he declared that LHC computing is a GRID project “par excellence”. In one of his slides we can read that LHC computing brings new problems such as a big increase in scale (particularly in the data but also in processing capacity), the need for object data base technology and complex policy issues. At the same time HEP also contributes interesting new technologies such as the MONARC simulation tools, NILE(http://www.nile.cornell.edu/) and GIOD (http://www.cacr.caltech.edu/SDA/giod.html).

We believe that a project funded by the EU at this early stage of the LHC computing plans will bring into focus many problems now being addressed in different contexts, making it possible to deploy a real example of a data intensive computing grid, which is likely to become one of the definitive prototypes for future initiatives.


Relations to other GRID Activities and Differentiation

HEP has already started some pilot GRID activity in USA (Particle Physics Data Grid and GriPhyN) and in Europe (INFN in Globus). Contacts have been established with recent formed GRID efforts (Grid Forum in USA and E-Grid in Europe). We are observing with interest the recent proposal for a EU project by the EuroGrid consortium. The project will collaborate closely with these projects, exploiting existing developments where possible and leveraging their results. However, we believe that a number of factors differentiate this initiative from the other ones:

Data intensive: we plan to develop test beds in the Multi-Petabyte dimension. We know that several aspects of the current distributed architectures do not scale at this level of amount of data. A serious R&D effort will be required.

Highly distributed: the geography and sociology of the experimental collaborations for LHC (1500-2000 researchers from over 100 different institutes) will require easy and effective access to the data on the world-wide basis.

Commodity based: the final system will need to be built in the most cost effective manner because of tight budget constraints in fundamental research. Different from traditional high performance computing, HEP will need to leverage commodity processors and storage.

This approach towards commodity based solution has been successfully applied in HEP for the last 10 years. The Linux and PC explosions have been anticipated and fostered in the HEP community. A considerable experience therefore now exists in house.

Affordable access to adequate bandwidth network will be needed. There is a long tradition at leading edge of high bandwidth research networking in HEP.

Production oriented: from the very beginning of this project the work will need to develop on two distinct but parallel planes: straightforward demonstrator testbeds which, together with validating the models, will be used for the design and building of the LHC experiments (mostly experimental equipment simulation, software process design and engineering support) in order to continuously prove the production capabilities of the overall system; advanced testbeds which stretch the system to its limits.

Compared to other scientific fields where fine grain parallelism of applications need to be developed, HEP applications are by their nature intrinsically and naturally parallel. Therefore this will be a more realistic test for commercial and industrial applications, often of similar nature. The scalability in these circumstances is rather different from the scalability of traditional MPP HPC.

Real goals and time-scales: probably the most important difference is the fact that by 2003-2004 a “real” computing system will need to be deployed. This system will need to perform at the necessary level of performance and with industrial production quality of service. The project we now propose is of paramount importance for HEP therefore adequate resources and priorities will be guaranteed.

Real and demanding end user community: a world-wide distributed community of demanding and sophisticated users will use the testbeds and the final production system.

Integrated system:  the project aims to develop methods to ensure the efficient utilisation of the distributed resources, viewed as an integrated system or Grid, while maintaining acceptable turnaround times for applications.

Interest to other Sciences and Industry:

Solutions developed for such a large-scale project as the LHC will certainly be applicable to other scientific and industrial problems, particularly in view of today's explosively developing markets in the Internet ISP/ASP area. We believe therefore that this project could form the catalyst for collaboration with industry and other scientific disciplines. 

Wide area cluster computing technology

 The basic requirement to process very large numbers of independent transactions, each of which depends on rapid access to large databases plus an associated amount of compute power, is typical of many modern applications such as:  

·         Internet ISP/ASP's A definite match is found in the rapidly increasing use of scalable clusters in large central sites to satisfy the demand for Web access, data search, and e-commerce. The concept of virtual computer centres where customers can access “metered” computing services is rapidly gaining consensus in the Internet world.

·         Other scientific fields: at a recent joint EU-US workshop (http://www.cacr.caltech.edu/euus/) scientists from Astronomy, Earth Observation and Biology expressed common interest in the solution being investigated by HEP. They predicted that they would reach the same level of requirements in terms of storage, distributed access and processing power within a few years.

·         ASCI  The US-DoE Accelerated Strategic Computing Initiative also matches some of the simulation requirements of HEP but with somewhat different cluster requirements for solving more tightly coupled problems.  Several large commodity based clusters are already deployed and being tested.

We can profit from a certain amount of the technology being developed today for the above requirements. Hardware technology (such as large mass storage repositories, high density packaging of cluster nodes and interconnect infrastructure) is probably best explored with industrial partners. Software technology both for cluster management and control and for application development and job submission can be developed with a wider range of collaborators. Ideally these should be agile software companies with specific know-how on distributed processing environments but collaboration is also foreseen with universities and other research institutes.

Regional Computing Centres

The requirement for LHC Regional Computing Centres is a distinguishing feature of the current project. As seen above, support for RCC's involves wide-area distributed processing with remote access to a very large common database, with data replication, caching and management being the principal challenges.

 This could be a most interesting topic of research for certain potential partners. Examples of applications for which an RCC architecture would contribute are:

·         Internet ASP's providing custom applications such as Word or StarOffice for millions of clients connected to widely distributed regional hubs;

·         Web TV and music providers, with similar client distributions;

·         Other  computational and data Grids - distributed scientific computing resources interconnected with reliable high bandwidth wide-area networks applied to the solution of large computing problems (see http://www.globus.org ).

CNRS Initiative for a data-intensive Computational Grid

The French Centre National de la Recherche Scientifique (CNRS) involves a number of scientific fields where distributed computing through a high-speed network is seen as a key issue for the future. Five of its scientific departments are particularly concerned:

¨       Nuclear and particle physics (IN2P3)

¨       Physical Science and Mathematics (SPM)

¨       Engineering Sciences (SPI)

¨       Life Sciences (SDV)

¨       Sciences of the Universe (INSU).

For dedicated applications in their respective fields, the problem of collecting, managing, accessing and processing very large amounts of data as well as the objective to combine large computing facilities in order to achieve large simulations are faced through several initiatives.

As a multi-disciplinary agency, CNRS has experience in handling such common problems and organizing cooperation between the different disciplines. During the last ten years, the Comitι d’Orientation des Moyens Informatiques (COMI) has been the place where all CNRS departments have developed actions and plans  to organize computing resources in accordance with the deployment of high speed networking. L’Unitι Rιseaux du CNRS is a common resource in charge of promoting network facilities for the benefit of research.

CNRS is prepared to contribute to the European Grid initiative which is seen as a main step for the development of HPC in accordance with the increasing demand of the community and to join efforts with the HEP initiative. Such a contribution could involve :

¨       modelling and simulation for which the combined resources of the facilities available on the Grid could be used to tackle a new dimension of problem;

¨       on-going applications in biology or astrophysics based on a distributed model;

¨       technical expertise on tools for the management of distributed resources;

¨       technical expertise in networking technologies and the provision of international network services.

The French HEP community is directly involved in the HEP initiative. There is a also a long experience of cooperation between the HEP and astrophysics communities on networking, as well as between CNRS’s UREC and the network department of CERN, which could benefit the project.

The HEP project intends to develop tools and to organize pilot experiences of distributed computing; in addition, through its participation in this project, CNRS proposes to organize a strong link between this project and the current initiative to construct a distributed genome databases service to enhance the facilities provided by the main French node, Infobiogen.

In a further step, a similar initiative could be constructed for astronomy and astrophysics data processing.

Since a large part of the resources allocated at present to the current projects in these areas are located in the Region Rhone-Alpes, which has established information and communication technologies as a main priority for research, cooperation with the HEP-led Grid  project will be facilitated.

INFN - GRID ACTIVITY

INFN is the Italian institute for particle and nuclear physics research, which is deeply involved in the four LHC experiments namely ALICE, ATLAS, CMS and LHCB. The experimental activity of other experiments is also relevant for this project, like VIRGO: a gravitational wave antenna based on a large and sophisticated interferometer that is being built near Pisa.

Many new generation experiments, in general, need new levels of computing and data access services which in turn need big investments in the hardware but also in personnel-intensive activities like software.

INFN has a good tradition of distributed computing, as a consequence of its geographical distribution in about 30 sites, as opposed to the centralised computing facilities typical of computing centres.

The natural evolution of distributed computing is the GRID.

INFN is interested in developing new software toolkits and building a testbed that is aimed at testing in a realistic environment the GRID concepts and tools to allow new generation experiments to use a real distributed computing and data facility.

INFN activity in this project will be concentrated in 4 or 5 centres with 4 or 5 persons each that will develop software and install hardware dedicated to the GRID testbed.

Experiments will introduce their application needs from an early stage of the project and will provide the necessary applications and data to test extensively the new GRID facilities.

All the tests will be validated on wide area networks with the usage of high bandwidth links (622-2500 Mbps) of GARR the Italian Research and Academic Network.

Grid Activities in the UK

Organisation and funding in the UK of large science projects is carried out under the umbrella of government funded research councils. The Particle Physics and Astronomy Research Council (PPARC) supports the UK elements of the LHC program, in which there is extensive involvement in all four major experiments (ATLAS, CMS, ALICE & LHCb) being constructed for the LHC at CERN. The development of computing resources commensurate with those foreseen in a Computational Grid has been viewed by the particle physics community in the UK for some considerable time as pivotal to the full exploitation of these experiments and has the full support of PPARC.

In order to prepare for the vast amount of data from these experiments, an initiative to build up a prototype infrastructure based around a hierarchical grid is now well advanced. A Tier 1 centre will be based at RAL with two or more Tier 2 Regional Centres elsewhere in the UK. Tier 3 centres would be at university  campus level and Tier 4 at departmental level (see figure).

 

Over the next few years, in a number of project phases, computing and networking resources will be developed. In the first phase a prototype Grid will be established at national level, to be followed in the second phase with extension internationally to CERN. In the third phase the Grid will be fully integrated with experiment infrastructures.

The second project phase will provide the principal linkage between UK activities and the work of this European initiative. There will be participation in the joint development of Grid software toolkits.

This work will build on existing expertise in the UK gained through collaboration with partners in the US working on other HEP experiments. For example, over the last year a Regional Computing Centre has been established at RAL for the UK community of the BaBar Experiment (currently operating at the Stanford Linear Accelerator Centre). Plans for a similar centre for the CDF collaboration (for run 2 data taking at Fermilab in the US) are well advanced.

In parallel to and underpinning these developments, it is expected that the national academic networking infrastructure (SUPERJANET) will be upgraded, with a first phase set to go to a backbone minimum of 2.5 Gbit/s early in 2001. Furthermore Quality of Service network tests are underway to the US, and a Managed Bandwidth test connection to CERN is planned for summer 2000.

·       This initiative by the UK particle physics community is not, however, in isolation since dialogue has been established with a number of other scientific research areas with varying needs for and interests in Computational Grids.

·       In particular close relations have been established with the Bio-informatics community, and a joint workshop is plannned for 13 March 2000.

Although it is recognised that the particle physics initiative is pioneering grid developments in the UK, active coordination between these areas is underway to ensure maximum benefit to all disciplines. Potential collaboration between disciplines and with industry is seen as being vital to successful technology transfer.

Expression of interest from the Royal Netherlands Meteorological Institute

The Royal Netherlands Meteorological Institute KNMI has an interest in the Data Grid initiative for the following reasons.

As part of its mandate the KNMI performs climate research. Within the climate research department several atmospheric physics research projects are performed using earth observation data from research satellites. Within the atmospheric composition research ozone is studied as an important tracer of anthropogenic activity and as a tracer of the upper atmospheric dynamics. Many earth observation satellites carry instruments to monitor the global atmospheric ozone distribution. KNMI participates predominantly in European initiatives (ESA, EUMETSAT) but also in the NASA EOS initiative. For the EOS-CHEM satellite an Ozone Monitoring Instrument is developed and build in The Netherlands. The OMI Principle Investigator and research team are employed by the KNMI.

To accomplish the atmospheric research connected to each of these missions appropriate data transport, processing and storage capacity must be anticipated.  A summary of the project requirements is given in Appendix 2.

Resources

This section gives initial estimates, illustrating the scale of funding which will be required to undertake the project.

Middleware development. Mostly young researchers and software engineers. Industrial participation. Four teams of 4-5 persons at CERN plus 2-3 other sites. Total EU funded effort 20 person/year per 3 years = 60 PY = €5 M

CERN and the other partners will provide system support, training and management (4 senior system programmers plus 4 senior managers during 3 years). Total = 24 PY =  €4 M EU unfunded effort

Testbed expertise and coordination: central team of 4-5 persons at CERN, 3 persons in larger partners and 1-2 in smaller. Total EU funded effort 20 persons/year = 60 PY = €5 M

Applications: 5 persons in HEP, 5 persons in Biology and 5 persons in the other science = 15 person-years per year EU funded effort = €4 M. The consortium will provide at least twice as much effort, equivalent to €8 M.

Overheads: Relocation of staff, travel subsistence: €1 M per year = €3 M

Dissemination and workshops: €1 M in total

Networking: provided by other EU funded projects (estimated at €5 M per year)

Materials budget for test beds: CERN (3’000 CPUs = €2 M funded plus €1 M unfunded; disks = 300 TB = €4 M funded plus €2 M unfunded; mass storage = €1 M funded plus €1 M unfunded).  Total for CERN: €11 M (€7 M EU funded + €4 M EU unfunded); Extensions to nationally funded testbeds: €5 M distributed between the partners. Total EU funded materials = €12 M.

Total financial envelope:

EU funded: €18 M for personnel (and overheads) and €12 M for material = €30 M

Funded by the partners: €12 M for personnel and €18 M for materials = €30 M

Networking: €15 M (estimated) to be provided by Gιant or other network projects.


Appendix 1: General Characteristics of HEP Computing

Outline

High energy physics (HEP) experiments are designed to study the results of the collisions of fundamental particles which have been accelerated to very high energies in large particle accelerators, such as the Large Hadron Collider (LHC) which is being built at CERN. The LHC is a circular machine 27 kilometres in circumference, which accelerates beams of protons in opposite directions, bringing the opposing beams into collision in four experimental areas distributed around the ring. The combined energy of the colliding particles is transformed into new heavy particles that move out from the centre of the collision and subsequently decay into a sequence of secondary particles. The sequence of transformations between energy and particles resulting from one proton-proton collision is called an event. The details of the event are recorded by the experimental apparatus, the detector<span STYLE="font-style: normal">, which surrounds the collision point. The detector is a large and complex device containing a number of sub-detectors of different types, which can track the passage of particles and measure their energies as they move through the apparatus, deviated by <span STYLE="font-style: normal">strong magnetic and electric fields. Generally the apparatus sees only the secondary or tertiary effects of the passage of the particles. For example a passing energetic particle may ionise a gas molecule which then moves along a curved path in the magnetic field until it touches a detecting wire and discharges. The passage of the original particle is observed only when the discharge reaches the ends of the wire. The complex set of physics processes inside the detector must be reconstructed from such indirect observations. </span>An LHC detector contains about a hundred million detection elements or channels,</span>

Computing is used heavily at all stages in the processing of the event.

·         Triggering
This is the process by which interesting events are selected. On average 20 particle collisions occur within an LHC detector every time the particle bunches in the beam cross at the centre of the apparatus. This happens 40 million times per second., or every 25 nanoseconds, giving a raw event rate of almost a thousand million per second. This corresponds to a (compressed) data rate of about 40 TeraBytes per second. Most of these collisions are uninteresting background events such as collisions with residual gas molecules in the vacuum of the beam pipe. The process of selecting the interesting events for recording and further analysis is called triggering. The first level trigger must make a decision very quickly, within a few microseconds, because during this time all of the data from all of the collisions must be stored in the system. The first level trigger uses special hardware logic to reject background and reduce the event rate to 75 KHz. A subset of the data from these events is then presented to the second level trigger, which executes on conventional microprocessors, taking a more global view of the evolution of the event within the detector to make a finer level of selection. For example, the second level trigger may correlate energy deposition in different regions of the detector. The second level trigger reduces the event rate to 5 KHz. The data for any given event passes through the trigger system in many separate memory buffers (FIFOs). The pieces of data from selected events are now assembled by the event builder to form a single record of about 1 MByte per event.

From this point on, all of the events are independent and may be processed in parallel. The third level trigger, or filter, uses a large farm or cluster of conventional computers to process the remaining events, performing a partial reconstruction of the physics processes, locating tracks, identifying particles and calculating energies, before deciding which events should be recorded on permanent storage.

·         Data recording
<span STYLE="font-weight: medium">In the case of two of the LHC experiments (ATLAS and CMS) the third level trigger reduces the 5 GBytes per second input data rate to 100 MBytes (100 events) per second which must be written to permanent storage. A third experiment, ALICE, studying the interactions of heavy ions of lead instead of protons, will have much more complex events and will record data at about 1 GByte per second. The fourth LHC experiment, LHCb will record at a lower data rate.
The data produced by the level 3 filter is buffered on disk and transferred to permanent storage on magnetic tape, in a relatively large number of independent parallel streams. This master record of the data from the detector is called the raw data<span STYLE="font-style: normal">. Each of the experiments will record about 1 PetaByte of raw data each year. </span></span>


 

 

 
 

 

 

 


 

 


Reconstruction
<span STYLE="font-weight: medium">The buffered copy of the raw data is retained on disk for some time until the next stage of processing has taken place: reconstruction. This is the process of transforming the raw data - a record of what the detector saw - into a physics view of the event, with details of particles, tracks, energies, etc. The output from the reconstruction process is called the event summary data<span STYLE="font-style: normal"> or ESD, and forms the basis for the detailed physics analysis phase. Event data in the ESD occupies about 10% of the space required for the original raw data. The reconstruction process requires knowledge of a large number of parameters defining the detector state at the time the event took place, including for example the precise location of the detector elements, the energy of the particle beams. Later stages of the analysis will enable these parameters to be refined and programming algorithms to be improved and corrected. All of the raw data will be reconstructed once or twice again during the following months before producing the final stable version of the ESD.</span></span>

·         <span STYLE="font-style: normal">Batch & interactive analysis
So far the data has been handled by a small number of physicists responsible for data acquisition and event production. Now the ESD is made available for analysis by all of the physicists in the collaboration, 1,500-2,000 people in the case of the larger experiments. Analysis involves many diverse processes, such as computing different physical properties, searching for rare phenomena and studying detailed statistical distributions. Many physicists work together in groups, studying specific processes. Generally the analysis takes place in two different phases. The first phase involves traversing a large subset of the full ESD (many tens of TeraBytes, containing a specific class of events), selecting a subset of these events, computing some additional properties and writing out a summary of the results - the Analysis Objects Dataset or AOD. This phase, because of the amount of computation and data required, must be performed in batch mode, using many parallel jobs. The AOD, only a few tens of GigaBytes in size, can them be analysed in detail interactively. In many cases, of course, the situation is more complex, requiring recourse to the raw data, or the further refinement of the AOD to produce a dataset small enough to be analysed interactively. Sometimes it is necessary to visualise the event, displaying a representation of the tracks, particles and energy distribution, and rotating, slicing and otherwise manipulating the complex 3D picture. However this is the exception, and the result of the interactive analysis is generally displayed simply as a histogram or plot, showing the statistical distribution of one or more physics values.

·         Simulation
<span STYLE="font-weight: medium">The analysis process takes the raw data from the detector and works back to the physics of the event by assuming a knowledge of the detector structure, the materials properties, the fundamental theory of the physics reactions, and also assuming that the programme is correct. This must be checked by starting from the other end of the process - simulating the events from first principles, from a theoretical standpoint. The result of the collision is simulated and the particles generated with theoretical probabilities are followed out through the detector, decaying into other particles and interacting with the material and electromagnetic field, to produce a simulated event: what the detector should have seen. These simulated events are reconstructed and compared statistically with the real events. It is necessary to generate at least as many simulated events as there are real events collected. Simulation is very computation-intensive, and will require as much processing capacity as all the rest of the analysis chain.</span>

Computational characteristics

Above we have described briefly the main computing tasks required to acquire and analyse HEP data. The important characteristics of the computing requirements are summarised below.

·         event independence: The events are related only statistically and so they can be processed in parallel at all stages in the analysis chain, the results coming together only at the end when all of the selected events have been processed. The granularity of the parallelism can therefore be selected freely according to the characteristics of the computing facility (number of processors, number of users, etc.).

·         modest floating point needs: The HEP codes require a reasonable floating point performance, but much of the computation is concerned with decision making rather than calculation. The result is that HEP benchmarks correlate much better with the integer performance of a processor than with the floating point performance. The computational requirements are therefore expressed in SPECint95 units.

·         massive data storage: The event size is modest (1-10 MBytes), but the number of events is very high and so the total amount of data to be handled is large - PetaBytes for each experiment - too big to be stored permanently online on disk. The master copy of the data must be stored on tertiary storage (magnetic tape) with only the most active data cached on disk. The tertiary storage is therefore used as a repository for active data, in contrast to the more common application of magnetic tape as a backup or archive medium.

·         read-mostly data: The raw data is never changed after it has been recorded. The processed data (ESD, AOD) is generally created as a set<span STYLE="font-style: normal"> of files using the same calibration data and the same version of the analysis algorithms. It is rare that these files are subsequently updated - rather a new generation of the data is created using refined calibration parameters, and improved algorithms. Only a small fraction of the data must be able to be updated, namely the meta data describing the contents of the analysis data files.</span>

·         throughput rather than performance: The inherent parallelism of HEP data processing means that analysis is performed as a set of jobs or transactions any or all of which can execute at the same time. Many events are processed in parallel. The performance measure is the time it takes to complete processing for all of the events. The throughput of the computing system is the important factor, rather than the performance - the speed with which an individual event is processed.

·         resilience rather than availability: To maintain throughput it is important that the system is resilient to failure, that individual components do not bring down the whole system. However, the availability of the whole system is not required, as work which was executing on failed equipment can be re-scheduled to run elsewhere. Of course reliability is very important, but limited failure of components can be tolerated.

There is a significant contrast between the characteristics of HEP computing and those of the classical super-computing application with a single large application with scope for only fine-grained parallelism. HEP, taking advantage of event independence, requires High Throughput Computing rather than High Performance Computing.


Appendix 2: KNMI required resources

In this overview estimated resources required for Atmosphere Chemistry research related activities are given. Data from four earth observation satellite instruments are or will be used: GOME (ESA/ERS), GOME-II (EUMETSAT/METOP), SCIAMACHY (ESA/ENVISAT) and OMI (NASA/EOS-CHEM). All satellites fly in near polar earth orbits at heights of 800 km approximately. Data from these instruments are dumped to earth located ground stations once every orbit of 100 minutes. They will be processed and archived at several institutes, including the KNMI. The data must be transported from the ground stations located at Goddard (GSFC USA), Svalbard (EUMETSAT Norway), OberPfaffenhoven (ESA DPAC Germany) to the KNMI. At present only the GOME instrument is operational. Anticipated years of launch of SCIAMACHY, OMI and GOME-II are 2001, 2003 and 2004 respectively.

The numbers mentioned are partly based on assumptions and estimates.

Data Rates:

Instrument

Size (Mbyte/orbit)[14]

Orbit Interval (minutes)

GOME

15

100

GOME-II

120

100

SCIAMACHY

546

100

OMI

3624

100

 

Computing resources required[15]:

Instrument

SPECfp95

Time needed for orbit of data:

GOME

24.5

2000 minutes

GOME-II

24.5

16,000 minutes

SCIAMACHY

24.5

2000 minutes

OMI

24.5

320,000 minutes

Calculated is an Ozone profile for one orbit of data.

Data transport resources:

Instrument

Minimal bandwidth needed

GOME

 

GOME-II

 

SCIAMACHY

 

OMI

5 Mbit/s

Bandwidth is calculated from data size and time constraints. Necessary retransmissions are not part of the calculation. As only for OMI operational time constraints are known at this moment, only for OMI a required bandwidth can be calculated.

Data storage resources required:

Instrument

Storage capacity/year

Mission duration:

Total capacity needed

GOME

76 Gbyte

5

380 Gbyte

GOME-II

615 Gbyte

10

6 Tbyte

SCIAMACHY

2.7 Tbyte

5

13.5 Tbyte

OMI

18 Tbyte

5

90 Tbyte

The data storage is based on the scenario that all data is archived at the KNMI. Probably not all data will be archived at the KNMI, as other institutes archive the data as well.

ACRONYMS

ENVISAT              Environmental Satellite

EOS                        Earth Observing Satellite

EOS-CHEM           EOS Chemistry Research Satellite

EOSDIS                  EOS Distributed Information System

ERS                         European Radar Satellite

ESA                        European Space Agency

EUMETSAT         European Organisation for the Exploitation of Meteorological Satellites

GOME                    Global Ozone Monitoring Experiment

GSFC                      Goddard Space Flight Centre (EOSDIS node)

KNMI                     Koninklijk Nederlands Meteorologisch Instituut

METOP                  Meteorological Operational Satellite

NASA                    National Aeronautics and Space Administration

OMI                        Ozone Monitoring Instrument

SCIAMACHY       Scanning Imaging Absorption Spectrometer for Atmospheric Cartography



[1] NIKHEF, Netherlands

[2] IN2P3, France

[3] IFAE, Spain

[4] CERN

[5] INFN, Italy

[6] RAL, UK

[7] Eφtvφs Lorαnd University, Hungary

[8] The GRID, Ian Foster & Carl Kesselman, Morgan Kaufmann Publishers, Inc., San Francisco, 1999

[9] MONARC – see http://www.cern.ch/MONARC/

[10] HEPCCC – The High Energy Physics Computing  Coordination Committee

[11] The list includes for the time being: GRIDware, Siemens, Storage Tek.

[12] See Appendix 1 for a summary of HEP computing characteristics

[13] SPECint 95: A modern PC has a performance of roughly 30 SPECint95s. We expect this to grow to around 200 SPECint95s by 2006.

[14] Aggregated number of level 0, 1 and 2 data.

[15] Based on the R10000 250 MHz MIPS processor from SGI (SPECfp95).