Outline
of a Proposal for a Data-intensive
Computational GRID Testbed
Proposal for an outline of a project to be
submitted to the EU and the NSF
Kors Bos[1],
Franηois Etienne[2], Enrique
Fernandez[3],
Fabrizio Gagliardi[4], Hans
Hoffmann4, Mirco Mazzucato[5],
Robin Middleton[6], Les
Robertson4, Federico Ruggieri5,
Gyorgy Vesztergombi[7]
This paper is a preliminary proposal for a project funded by the European Commission (EC) to develop and demonstrate an effective wide-area distributed computing fabric. This will include the development of an appropriate architecture and the necessary middleware, leveraging existing computational Grid[8] technology, extending it to handle very large data sets, and demonstrating it with scientific applications in a production quality environment.
CERN, the European Organisation for Nuclear Research, funded by 20 European nations, is constructing a new particle accelerator on the Swiss-French border on the outskirts of Geneva. When it begins operation in 2005, this machine, the Large Hadron Collider (LHC), will be the most powerful machine of its type in the world, providing research facilities for several thousand High Energy Physics (HEP) researchers from all over the world.
The computing capacity required for analysing the data generated by the experiments at the LHC machine will be several orders of magnitude greater than that used by current experiments at CERN. It will therefore be necessary to use in an integrated way computing facilities installed at several HEP sites distributed across Europe, North America and Asia.
During the last two years the MONARC[9] project, supported by a number of institutes participating in the LHC programme, has been developing and evaluating different models for LHC computing. MONARC has also developed tools for simulating the behaviour of such models when implemented in a wide-area distributed computing environment.
Last autumn the European Commission (EC) and the US National Science Foundation (NSF) sponsored a workshop to examine the possibility of European-US collaboration on R&D for large scientific databases. In November Professor G. Metakides made a visit to CERN during which, in discussion with the HEPCCC[10], he encouraged the submission of proposals to the EU informatics funding programmes for R&D projects for large scale distributed data and computing facilities, sometimes referred to as Computing Grids. Subsequently, at a meeting in Brussels in January 2000, HEP was invited to prepare the present paper, outlining a project that would address both of these areas. The paper has been prepared by some of the potential partners in such a project. A formal proposal would include industrial partners and additional members from HEP and other sciences.
The project will focus on three areas that are not covered by other Grid related projects:
· the management of very large amounts of distributed data (the LHC experiments will together generate many PetaBytes of data each year);
· high throughput computing emphasising capacity rather than performance (HEP computing is characterised by the need to process very large numbers of independent transactions);
· automated management of both local computing fabrics, and the wide-area GRID.
The project will implement a comprehensive set of tools and facilities (middleware) to achieve its goals in these areas, basing this on an existing framework developed by previous GRID-related projects.
These tools will be demonstrated on a Testbed comprising a facility at CERN interconnected with computing facilities installed in institutes in several European countries. The HEP community will provide the primary applications used on the testbed, as part of its preparation for LHC. Real users will use the testbed for real tasks. The project will therefore incur a considerable risk but at the same time will benefit from serious commitment and considerable resources from the participating consortium partners.
The partners who have already committed to the project are CERN and HEP institutes representing France, Hungary, Italy, Spain and the UK. These core partners are working at both technical and managerial levels on this draft proposal. Other national physics institutes are expected to join in the project.
Contacts have been established with other sciences (biology, meteorology, space) and industry (PC manufactures, a mass storage supplier, and software houses specialising in the development of middleware for distributed systems[11]).
A US component is under discussion.
Each of the national institutes will provide resources and funding for the establishment of the national/regional part of the GRID.
High performance bandwidth between the sites involved is assumed to be provided by other EU supported initiatives such as Gιant. The project proposal will define these requirements in terms of bandwidth and network services. The current cost estimated for networking between the sites contemplated in this proposal is an average of 5 M per year, with minimum target bandwidths of 622 Mbps for four larger European sites and the US link, 155 Mbps for four smaller European sites and 2.5 Gbits/sec for CERN. The project can exploit substantially greater bandwidths, in the multi-Gigabit range, if they can be provided.
EU financial support will be sought for the development of the middleware, the overall integration and operation of the GRID, the management of the project and the demonstrators, and support for the exchange of personnel and the organization of workshops.
The overall coordination and support of the project will be at CERN.
The initial duration is three years. After this time an assessment will be made and an extension of the project for another two years will be considered.
Initial Estimate of the Financial Envelope
European Union component:
People and software: 30 M (18 M funded by EU, 12 M funded by partners)
Material and overheads: 30 M (12 M funded by EU, 18 M funded by partners)
Networking: 15 M funded through Gιant and/or other
initiatives
US component: funded by NSF
In September 1998 representatives from the EU and the US National Sciences Foundation met in Budapest where they signed an agreement to foster scientific collaboration between European and American sciences. As part of this agreement, a task force, formed by leading European and American scientists, was formed to propose to the EU and the NSF actions to implement the collaboration.
These actions were discussed in several workshops and in particular at the workshop on large scientific databases and archives in Annapolis in September 1999. Scientists from computer science, astronomy, biology, physics and earth observation attended this workshop. The workshop concluded that a concerted action from the EU and the USA should support the establishment of a worldwide distributed computing environment with a few large centres hosting large multi-Petabyte data bases and archives. The processing power should be distributed across these major centres and the other sites. A high performance network interconnect should support easy Web based access to processing capacity and data. Important new scientific results could be achieved by merging large and different databases and archives. This work programme would leverage the pioneering work and results of the several meta-computing projects carried out during the last ten years.
In November 1999 senior EU officials visited CERN to discuss the possible participation of CERN and High Energy Physics in general in the R&D programmes of the European Commission. During the visit they met with the High Energy Physics Computing Coordination Committee (HEPCCC), whose membership includes directors and senior managers of High Energy Physics institutes in the CERN member states, and discussed the results and proposals of the EU-US workshop and other potential opportunities for informatics R&D. A good match was found between the ongoing planning activity for the computing of the LHC and the goals put forward by the EU-US workshop.
At a follow up meeting in Brussels among representatives of the EU-US task force and EU senior officials it was agreed to submit to the EU a project proposal by mid May. A proposal covering the American part of the project should also be submitted to the NSF. It is anticipated that both funding agencies will jointly review and approve the project.
There are several characteristics of experimental HEP code and applications that are important in designing computing facilities for HEP data processing[12].
· In general the computing problem consists of processing very large numbers of independent transactions, which may therefore be processed in parallel - the granularity of the parallelism can be selected freely;
· modest floating point requirement - computational requirements are enormous but best expressed in SPECint rather than SPECfp units;
· massive data storage: measured in PetaBytes (1015 Bytes) for each experiment;
· read-mostly data, rarely modified, usually simply completely replaced when new versions are generated;
· high sustained throughput is more important than peak speed - the performance measure is the time it takes to complete processing for all of the independent transactions;
· resilience of the overall (grid) system in the presence of sub-system failures is far more important than trying to ensure 100% availability of all sub-systems at all times.
There is therefore a significant contrast between the characteristics of HEP computing and those of classical super-computing, where floating point performance and fine-grained parallelism are significant factors. HEP applications need High Throughput Computing rather than High Performance Computing.
Summary of the Requirements
There will initially be four large experiments at LHC, two with rather similar computing requirements (ATLAS and CMS), one with a much higher data recording bandwidth (ALICE) and the fourth (LHCB) with rather lower computing needs. Each of these experiments requires a large computing and data storage fabric at CERN, which must be integrated with a number of regional computing centres with smaller, but still significant, capacity installed in laboratories closer to the home institutes of the researchers. The following table summarises the current estimated sizing of the facilities at CERN and in the regional centres in the year 2006 (the first year in which the LHC machine will produce data at the design intensity) together with the growth expected to be required in subsequent years.
Estimated computing resources
required at CERN for LHC experiments in 2006 |
||||||
|
collaboration |
ALICE |
ATLAS |
CMS |
LHCB |
Total |
CPU capacity
(SPECint95[13]) |
2006 |
600,000 |
600,000 |
600,000 |
300,000 |
2,100,000 |
annual inc. |
200,000 |
200,000 |
200,000 |
100,000 |
700,000 |
|
estimated #
cpus in 2006 |
|
3,000 |
3,000 |
3,000 |
1,500 |
10,500 |
disk
capacity (TB) |
2006 |
300 |
550 |
550 |
200 |
1,600 |
annual inc. |
90 |
200 |
200 |
70 |
560 |
|
mag. tape
capacity (PB) |
2006 |
3.7 |
3.4 |
3.4 |
1.4 |
11.9 |
annual inc. |
2.0 |
2.0 |
2.0 |
0.9 |
6.9 |
|
aggregate
I/O rates (GB/sec) |
disk |
100 |
100 |
100 |
40 |
340 |
|
tape |
1.2 |
0.4 |
0.4 |
0.2 |
2.2 |
Regional Centre Model
There are a number of reasons for adopting a model in which the computing resources for each LHC experiment will be distributed between the central facility at CERN, five or six large regional computing centres, and several smaller centres. The final stage of interactive analysis will normally be performed on a powerful cluster installed in the physics department of the individual researcher. Bringing computing facilities geographically closer to the home institutes will normally enable a higher network bandwidth to be used between the private cluster and the regional centre than could be obtained from the longer distance connection to CERN. This is particularly important in the case of institutes in America and Asia. The second motivation for regional centres is more mundane: it will be possible to obtain funding for national and regional computing installations which would not be available for equipment and services installed centrally at CERN. Another factor is that the devolution of control over resources allows scheduling decisions to be made giving priority to the physics interests of national or regional groups within the experimental collaboration.
The overall computing model encompassing the regional and CERN centres is currently under study within the context of the MONARC project, and so detailed requirements are not yet available for the distribution of data between the centres, the way in which the data will be synchronised, and the inter-centre network bandwidth. In any case, the details of the model will evolve with time, particularly in the period just after the LHC machine begins operation and real data becomes available. However, several basic characteristics can be assumed, and in particular the requirement to maintain a homogeneous user view of the complete computing resource. It will be necessary to develop a general framework to support data transfer between centres and global workload scheduling which is flexible and can be adapted to a range of possible models.
The following table summarises the resource requirements for one of the major regional centres for one experiment. There will be about five centres of this size and a number of smaller centres for each experiment.
Estimated computing resources for an
LHC Regional Computing Centre (2006) |
||
|
|
|
CPU capacity |
120,000 |
SI95 |
disk capacity |
110 |
Tbytes |
disk I/O rate |
20 |
GBytes/sec |
tape storage |
0.4 |
PetaBytes |
WAN connection |
2.5 |
Gbits/sec |
A testbed is required in order to validate the models and software developed by the R&D programme, evaluate different potential hardware and software technologies, and to demonstrate the feasibility of building a production facility prior to the beginning of the equipment acquisition process in 2004. The testbed must include CERN and several remote sites (regional centres), in Europe and in the United States. The CERN site should develop in capacity over a period of three years to reach by the end of 2002 the scale of about 50% of the size required by one of the large LHC experiments. The remote sites should have capacities of at least 20% of that, with links to CERN of at least 622 Mbps by 2003. Smaller remote sites could have lower bandwidth connections. The aggregate bandwidth required at CERN for 2003 is a minimum of 2.5 Gbps.
Capacity targets for the Testbed at
CERN |
||||
|
units |
end 2000 |
end 2001 |
end 2002 |
CPU capacity |
SI95 |
20,000 |
70,000 |
300,000 |
estd. number of cpus |
|
400 |
1,000 |
3,000 |
disk capacity |
TBytes |
20 |
60 |
250 |
disk I/O rate |
GBytes/sec |
5 |
15 |
50 |
tape storage capacity |
PetaBytes |
0.2 |
0.3 |
1.0 |
sustained data
rate |
Mbytes/sec |
250 |
500 |
1,000 |
WAN links to external sites |
Mbits/sec |
155 |
622 |
2,500 |
In addition to providing a demonstration environment, the testbed must be operated as a production facility for one or more real applications with real data. This is the only way to ensure that the output of the R&D programme in terms of models, principles and software fulfils the basic requirements for performance and reliability and has practical application to real-world computing.
HEP needs the testbed in order to validate the R&D programme. During the next three years it also has requirements for substantial computing capacity for current experiments, but these requirements could be fulfilled with a much smaller facility - about 20% of the size of the testbed. Therefore CERN is looking for partners with similar long term computing requirements for high throughput computing, to take part in the R&D programme and to make use of the testbed computing capacity.
There are three base factors that drive the R&D programme:
·
adaptability: the
computing facility must be very flexible, capable of being used for several
different applications, supporting data models which will develop and change
significantly in the light of the experience of the first years of data
collection and analysis, and able to evolve smoothly to provide increased
capacity and exploit new technologies;
·
scalability: the
need to scale computation capacity (hundreds of thousands of SPECint95s,
thousands of processors), disk capacity (hundreds of TeraBytes, thousands of
hard disks), local network bandwidth (Terabits per second of throughput), and
tertiary storage capacity (tens of PetaBytes of automated magnetic tape
storage);
· wide-area distribution of resources: the need to operate an integrated computing facility implemented as a number of geographically separated computer centres installed in different countries and continents, owned by independent research agencies, and providing computational and data storage services to end-users and smaller computing installations.
Work packages will be defined for the national and international Grid activities. For each work package a rough estimate of resources, duration and measurable deliverables will be given. This chapter will be developed using the results of the work being done in the technical task force of this initiative.
The R&D programme is divided into six work packages:
1.
computing fabric management: the management of the hardware and basic software required for the
local computing fabric - the processors, disks, tape drives and robots, the
equipment providing the local area and storage networks, operating system
installation and maintenance, file systems, storage area networks, gateways to
external networking facilities, monitoring consoles, power distribution, etc.;
2.
mass storage management: the software required to manage the mass storage system - secondary
and tertiary storage space management, migration/recall of data between
tertiary and secondary storage, provision of classes of service to schedule
access to the same data from processes with different priority levels, data
interchange between sites (export/import);
3.
wide area data management: universal name space, efficient data transfer between sites,
synchronisation of remote copies, wide-area data access/caching, interfacing to
mass storage management systems;
4. wide-area workload management: the scheduling of work submitted by users, including local and wide-area load balancing, scheduling support taking account of relative priority levels, data location, automated parallelisation of work, <span STYLE="font-weight: medium">the application program interfaces (APIs) and graphical user interfaces (GUIs) which provide user access to the distributed computing facility: the issues here include finding the correct level for this interface (e.g. above or below the object database), and the degree of transparency</span> (to what extent must the application be aware of the topology of the facility).
5. wide-area application monitoring: the instrumentation of applications, and tools to analyse dynamically the performance of an application in a GRID environment; automatic recognition of application-level performance and resource problems; automatic re-configuration of the applications resources and re-scheduling of the application components.
6. application development: the seamless integration of existing end-user applications with Grid middleware and the development of new, grid-aware applications is an important component of any testbed and pivotal to the success of the project. There will be a series of independent work packages covering the contributions of the different scientific and industrial areas.
Related R&D is required in the area of wide-area network management, but this is outside the scope of this programme.
There will also be work package defined to cover the implementation and exploitation of the testbed.
The need for a large test bed for the design and implementation of the LHC computing and the regional centre model have recently inspired a great interest in the HEP community for the GRID metaphor. This metaphor developed on the basis of the pioneering activity by several Metacomputing projects mostly in USA and described in the recent book by Foster and Kesselman, is a perfect match for the overall worldwide computing model for LHC. This has been underlined by Ian Foster at a recent DoE meeting in Washington where he declared that LHC computing is a GRID project par excellence. In one of his slides we can read that LHC computing brings new problems such as a big increase in scale (particularly in the data but also in processing capacity), the need for object data base technology and complex policy issues. At the same time HEP also contributes interesting new technologies such as the MONARC simulation tools, NILE(http://www.nile.cornell.edu/) and GIOD (http://www.cacr.caltech.edu/SDA/giod.html).
We believe that a project funded by the EU at this early stage of the LHC computing plans will bring into focus many problems now being addressed in different contexts, making it possible to deploy a real example of a data intensive computing grid, which is likely to become one of the definitive prototypes for future initiatives.
HEP has already started some pilot GRID activity in USA (Particle Physics Data Grid and GriPhyN) and in Europe (INFN in Globus). Contacts have been established with recent formed GRID efforts (Grid Forum in USA and E-Grid in Europe). We are observing with interest the recent proposal for a EU project by the EuroGrid consortium. The project will collaborate closely with these projects, exploiting existing developments where possible and leveraging their results. However, we believe that a number of factors differentiate this initiative from the other ones:
Data intensive: we plan to develop test beds in the Multi-Petabyte dimension. We know that several aspects of the current distributed architectures do not scale at this level of amount of data. A serious R&D effort will be required.
Highly distributed: the geography and sociology of the experimental collaborations for LHC (1500-2000 researchers from over 100 different institutes) will require easy and effective access to the data on the world-wide basis.
Commodity based: the final system will need to be built in the most cost effective manner because of tight budget constraints in fundamental research. Different from traditional high performance computing, HEP will need to leverage commodity processors and storage.
This approach towards commodity based solution has been successfully applied in HEP for the last 10 years. The Linux and PC explosions have been anticipated and fostered in the HEP community. A considerable experience therefore now exists in house.
Affordable access to adequate bandwidth network will be needed. There is a long tradition at leading edge of high bandwidth research networking in HEP.
Production oriented: from the very beginning of this project the work will need to develop on two distinct but parallel planes: straightforward demonstrator testbeds which, together with validating the models, will be used for the design and building of the LHC experiments (mostly experimental equipment simulation, software process design and engineering support) in order to continuously prove the production capabilities of the overall system; advanced testbeds which stretch the system to its limits.
Compared to other scientific fields where fine grain parallelism of applications need to be developed, HEP applications are by their nature intrinsically and naturally parallel. Therefore this will be a more realistic test for commercial and industrial applications, often of similar nature. The scalability in these circumstances is rather different from the scalability of traditional MPP HPC.
Real goals and time-scales: probably the most important difference is the fact that by 2003-2004 a real computing system will need to be deployed. This system will need to perform at the necessary level of performance and with industrial production quality of service. The project we now propose is of paramount importance for HEP therefore adequate resources and priorities will be guaranteed.
Real and demanding end user community: a world-wide distributed community of demanding and sophisticated users will use the testbeds and the final production system.
Integrated system: the project aims to develop methods to ensure the efficient utilisation of the distributed resources, viewed as an integrated system or Grid, while maintaining acceptable turnaround times for applications.
Solutions developed for such a large-scale project as the LHC will certainly be applicable to other scientific and industrial problems, particularly in view of today's explosively developing markets in the Internet ISP/ASP area. We believe therefore that this project could form the catalyst for collaboration with industry and other scientific disciplines.
The basic requirement to process very large numbers of independent transactions, each of which depends on rapid access to large databases plus an associated amount of compute power, is typical of many modern applications such as:
· Internet ISP/ASP's A definite match is found in the rapidly increasing use of scalable clusters in large central sites to satisfy the demand for Web access, data search, and e-commerce. The concept of virtual computer centres where customers can access metered computing services is rapidly gaining consensus in the Internet world.
· Other scientific fields: at a recent joint EU-US workshop (http://www.cacr.caltech.edu/euus/) scientists from Astronomy, Earth Observation and Biology expressed common interest in the solution being investigated by HEP. They predicted that they would reach the same level of requirements in terms of storage, distributed access and processing power within a few years.
· ASCI The US-DoE Accelerated Strategic Computing Initiative also matches some of the simulation requirements of HEP but with somewhat different cluster requirements for solving more tightly coupled problems. Several large commodity based clusters are already deployed and being tested.
We can profit from a certain amount of the technology being developed today for the above requirements. Hardware technology (such as large mass storage repositories, high density packaging of cluster nodes and interconnect infrastructure) is probably best explored with industrial partners. Software technology both for cluster management and control and for application development and job submission can be developed with a wider range of collaborators. Ideally these should be agile software companies with specific know-how on distributed processing environments but collaboration is also foreseen with universities and other research institutes.
The requirement for LHC Regional Computing Centres is a distinguishing feature of the current project. As seen above, support for RCC's involves wide-area distributed processing with remote access to a very large common database, with data replication, caching and management being the principal challenges.
This could be a most interesting topic of research for certain potential partners. Examples of applications for which an RCC architecture would contribute are:
· Internet ASP's providing custom applications such as Word or StarOffice for millions of clients connected to widely distributed regional hubs;
· Web TV and music providers, with similar client distributions;
· Other computational and data Grids - distributed scientific computing resources interconnected with reliable high bandwidth wide-area networks applied to the solution of large computing problems (see http://www.globus.org ).
The French Centre National de la Recherche Scientifique (CNRS) involves a number of scientific fields where distributed computing through a high-speed network is seen as a key issue for the future. Five of its scientific departments are particularly concerned:
¨ Nuclear and particle physics (IN2P3)
¨ Physical Science and Mathematics (SPM)
¨ Engineering Sciences (SPI)
¨ Life Sciences (SDV)
¨ Sciences of the Universe (INSU).
For dedicated applications in their respective fields, the problem of collecting, managing, accessing and processing very large amounts of data as well as the objective to combine large computing facilities in order to achieve large simulations are faced through several initiatives.
As a multi-disciplinary agency, CNRS has experience in handling such common problems and organizing cooperation between the different disciplines. During the last ten years, the Comitι dOrientation des Moyens Informatiques (COMI) has been the place where all CNRS departments have developed actions and plans to organize computing resources in accordance with the deployment of high speed networking. LUnitι Rιseaux du CNRS is a common resource in charge of promoting network facilities for the benefit of research.
CNRS is prepared to contribute to the European Grid initiative which is seen as a main step for the development of HPC in accordance with the increasing demand of the community and to join efforts with the HEP initiative. Such a contribution could involve :
¨ modelling and simulation for which the combined resources of the facilities available on the Grid could be used to tackle a new dimension of problem;
¨ on-going applications in biology or astrophysics based on a distributed model;
¨ technical expertise on tools for the management of distributed resources;
¨ technical expertise in networking technologies and the provision of international network services.
The French HEP community is directly involved in the HEP initiative. There is a also a long experience of cooperation between the HEP and astrophysics communities on networking, as well as between CNRSs UREC and the network department of CERN, which could benefit the project.
The HEP project intends to develop tools and to organize pilot experiences of distributed computing; in addition, through its participation in this project, CNRS proposes to organize a strong link between this project and the current initiative to construct a distributed genome databases service to enhance the facilities provided by the main French node, Infobiogen.
In a further step, a similar initiative could be constructed for astronomy and astrophysics data processing.
Since a large part of the resources allocated at present to the current projects in these areas are located in the Region Rhone-Alpes, which has established information and communication technologies as a main priority for research, cooperation with the HEP-led Grid project will be facilitated.
INFN is the Italian institute for particle and nuclear physics research, which is deeply involved in the four LHC experiments namely ALICE, ATLAS, CMS and LHCB. The experimental activity of other experiments is also relevant for this project, like VIRGO: a gravitational wave antenna based on a large and sophisticated interferometer that is being built near Pisa.
Many new generation experiments, in general, need new levels of computing and data access services which in turn need big investments in the hardware but also in personnel-intensive activities like software.
INFN has a good tradition of distributed computing, as a consequence of its geographical distribution in about 30 sites, as opposed to the centralised computing facilities typical of computing centres.
The natural evolution of distributed computing is the GRID.
INFN is interested in developing new software toolkits and building a testbed that is aimed at testing in a realistic environment the GRID concepts and tools to allow new generation experiments to use a real distributed computing and data facility.
INFN activity in this project will be concentrated in 4 or 5 centres with 4 or 5 persons each that will develop software and install hardware dedicated to the GRID testbed.
Experiments will introduce their application needs from an early stage of the project and will provide the necessary applications and data to test extensively the new GRID facilities.
All the tests will be validated on wide area networks with the usage of high bandwidth links (622-2500 Mbps) of GARR the Italian Research and Academic Network.
Organisation and funding in the UK of large science projects is carried out under the umbrella of government funded research councils. The Particle Physics and Astronomy Research Council (PPARC) supports the UK elements of the LHC program, in which there is extensive involvement in all four major experiments (ATLAS, CMS, ALICE & LHCb) being constructed for the LHC at CERN. The development of computing resources commensurate with those foreseen in a Computational Grid has been viewed by the particle physics community in the UK for some considerable time as pivotal to the full exploitation of these experiments and has the full support of PPARC.
In order to prepare for the vast amount of data from these experiments, an initiative to build up a prototype infrastructure based around a hierarchical grid is now well advanced. A Tier 1 centre will be based at RAL with two or more Tier 2 Regional Centres elsewhere in the UK. Tier 3 centres would be at university campus level and Tier 4 at departmental level (see figure).
Over the next few years, in a number of project phases, computing and networking resources will be developed. In the first phase a prototype Grid will be established at national level, to be followed in the second phase with extension internationally to CERN. In the third phase the Grid will be fully integrated with experiment infrastructures.
The second project phase will provide the principal linkage between UK activities and the work of this European initiative. There will be participation in the joint development of Grid software toolkits.
This work will build on existing expertise
in the UK gained through collaboration with partners in the US working on other
HEP experiments. For example, over the last year a Regional Computing Centre
has been established at RAL for the UK community of the BaBar Experiment
(currently operating at the Stanford Linear Accelerator Centre). Plans for a
similar centre for the CDF collaboration (for run 2 data taking at Fermilab in
the US) are well advanced.
In parallel to and underpinning these developments, it is expected that the national academic networking infrastructure (SUPERJANET) will be upgraded, with a first phase set to go to a backbone minimum of 2.5 Gbit/s early in 2001. Furthermore Quality of Service network tests are underway to the US, and a Managed Bandwidth test connection to CERN is planned for summer 2000.
· This initiative by the UK particle physics community is not, however, in isolation since dialogue has been established with a number of other scientific research areas with varying needs for and interests in Computational Grids.
· In particular close relations have been established with the Bio-informatics community, and a joint workshop is plannned for 13 March 2000.
Although it is recognised that the particle physics initiative is pioneering grid developments in the UK, active coordination between these areas is underway to ensure maximum benefit to all disciplines. Potential collaboration between disciplines and with industry is seen as being vital to successful technology transfer.
The Royal Netherlands Meteorological Institute KNMI has an interest in the Data Grid initiative for the following reasons.
As part of its mandate the KNMI performs climate research. Within the climate research department several atmospheric physics research projects are performed using earth observation data from research satellites. Within the atmospheric composition research ozone is studied as an important tracer of anthropogenic activity and as a tracer of the upper atmospheric dynamics. Many earth observation satellites carry instruments to monitor the global atmospheric ozone distribution. KNMI participates predominantly in European initiatives (ESA, EUMETSAT) but also in the NASA EOS initiative. For the EOS-CHEM satellite an Ozone Monitoring Instrument is developed and build in The Netherlands. The OMI Principle Investigator and research team are employed by the KNMI.
To accomplish the atmospheric research connected to each of these missions appropriate data transport, processing and storage capacity must be anticipated. A summary of the project requirements is given in Appendix 2.
This section gives initial estimates, illustrating the scale of funding which will be required to undertake the project.
Middleware development. Mostly young researchers and software engineers. Industrial participation. Four teams of 4-5 persons at CERN plus 2-3 other sites. Total EU funded effort 20 person/year per 3 years = 60 PY = 5 M
CERN and the other partners will provide system support, training and management (4 senior system programmers plus 4 senior managers during 3 years). Total = 24 PY = 4 M EU unfunded effort
Testbed expertise and coordination: central team of 4-5 persons at CERN, 3 persons in larger partners and 1-2 in smaller. Total EU funded effort 20 persons/year = 60 PY = 5 M
Applications: 5 persons in HEP, 5 persons in Biology and 5 persons in the other science = 15 person-years per year EU funded effort = 4 M. The consortium will provide at least twice as much effort, equivalent to 8 M.
Overheads: Relocation of staff, travel subsistence: 1 M per year = 3 M
Dissemination and workshops: 1 M in total
Networking: provided by other EU funded projects (estimated at 5 M per year)
Materials budget for test beds: CERN (3000 CPUs = 2 M funded plus 1 M unfunded; disks = 300 TB = 4 M funded plus 2 M unfunded; mass storage = 1 M funded plus 1 M unfunded). Total for CERN: 11 M (7 M EU funded + 4 M EU unfunded); Extensions to nationally funded testbeds: 5 M distributed between the partners. Total EU funded materials = 12 M.
Total financial envelope:
EU funded: 18 M for personnel (and overheads) and 12 M for material = 30 M
Funded by the partners: 12 M for personnel and 18 M for materials = 30 M
Networking: 15 M (estimated) to be provided by Gιant or other network projects.
Appendix 1: General Characteristics of HEP Computing
Outline
High energy physics (HEP) experiments are designed to study the results of the collisions of fundamental particles which have been accelerated to very high energies in large particle accelerators, such as the Large Hadron Collider (LHC) which is being built at CERN. The LHC is a circular machine 27 kilometres in circumference, which accelerates beams of protons in opposite directions, bringing the opposing beams into collision in four experimental areas distributed around the ring. The combined energy of the colliding particles is transformed into new heavy particles that move out from the centre of the collision and subsequently decay into a sequence of secondary particles. The sequence of transformations between energy and particles resulting from one proton-proton collision is called an event. The details of the event are recorded by the experimental apparatus, the detector<span STYLE="font-style: normal">, which surrounds the collision point. The detector is a large and complex device containing a number of sub-detectors of different types, which can track the passage of particles and measure their energies as they move through the apparatus, deviated by <span STYLE="font-style: normal">strong magnetic and electric fields. Generally the apparatus sees only the secondary or tertiary effects of the passage of the particles. For example a passing energetic particle may ionise a gas molecule which then moves along a curved path in the magnetic field until it touches a detecting wire and discharges. The passage of the original particle is observed only when the discharge reaches the ends of the wire. The complex set of physics processes inside the detector must be reconstructed from such indirect observations. </span>An LHC detector contains about a hundred million detection elements or channels,</span>
Computing is used heavily at all stages in the processing of the event.
·
Triggering
This is the process by which interesting events are selected. On average 20
particle collisions occur within an LHC detector every time the particle
bunches in the beam cross at the centre of the apparatus. This happens 40
million times per second., or every 25 nanoseconds, giving a raw event rate of
almost a thousand million per second. This corresponds to a (compressed) data
rate of about 40 TeraBytes per second. Most of these collisions are
uninteresting background events such
as collisions with residual gas molecules in the vacuum of the beam pipe. The
process of selecting the interesting events for recording and further analysis
is called triggering. The first level
trigger must make a decision very quickly, within a few microseconds, because
during this time all of the data from all of the collisions must be stored in
the system. The first level trigger uses special hardware logic to reject
background and reduce the event rate to 75 KHz. A subset of the data from these
events is then presented to the second level trigger, which executes on
conventional microprocessors, taking a more global view of the evolution of the
event within the detector to make a finer level of selection. For example, the
second level trigger may correlate energy deposition in different regions of
the detector. The second level trigger reduces the event rate to 5 KHz. The
data for any given event passes through the trigger system in many separate
memory buffers (FIFOs). The pieces of data from selected events are now
assembled by the event builder to
form a single record of about 1 MByte per event.
From this point on, all of the events are independent and may be processed in
parallel. The third level trigger, or filter, uses a large farm or cluster of
conventional computers to process the remaining events, performing a partial reconstruction
of the physics processes, locating tracks, identifying particles and
calculating energies, before deciding which events should be recorded on
permanent storage.
·
Data recording
<span
STYLE="font-weight: medium">In the case of two of
the LHC experiments (ATLAS and CMS) the third level trigger reduces the 5
GBytes per second input data rate to 100 MBytes (100 events) per second which
must be written to permanent storage. A third experiment, ALICE, studying the
interactions of heavy ions of lead instead of protons, will have much more
complex events and will record data at about 1 GByte per second. The fourth LHC
experiment, LHCb will record at a
lower data rate.
The data produced by the level 3 filter is buffered on disk and transferred to
permanent storage on magnetic tape, in a relatively large number of independent
parallel streams. This master record of the data from the detector is called
the raw data<span STYLE="font-style: normal">.
Each of the experiments will record about 1 PetaByte of raw data each year. </span></span>
Reconstruction
<span
STYLE="font-weight: medium">The buffered copy of the
raw data is retained on disk for some time until the next stage of processing
has taken place: reconstruction. This
is the process of transforming the raw data - a record of what the detector saw
- into a physics view of the event, with details of particles, tracks,
energies, etc. The output from the reconstruction process is called the event summary data<span STYLE="font-style: normal">
or ESD, and forms the basis for the detailed physics analysis phase. Event data
in the ESD occupies about 10% of the space required for the original raw data.
The reconstruction process requires knowledge of a large number of parameters
defining the detector state at the time the event took place, including for
example the precise location of the detector elements, the energy of the
particle beams. Later stages of the analysis will enable these parameters to be
refined and programming algorithms to be improved and corrected. All of the raw
data will be reconstructed once or twice again during the following months
before producing the final stable version of the ESD.</span></span>
·
<span STYLE="font-style: normal">Batch & interactive
analysis
So far the data has been handled by a small number of physicists responsible
for data acquisition and event production. Now the ESD is made available for
analysis by all of the physicists in the collaboration, 1,500-2,000 people in
the case of the larger experiments. Analysis involves many diverse processes,
such as computing different physical properties, searching for rare phenomena
and studying detailed statistical distributions. Many physicists work together
in groups, studying specific processes. Generally the analysis takes place in
two different phases. The first phase involves traversing a large subset of the
full ESD (many tens of TeraBytes, containing a specific class of events),
selecting a subset of these events, computing some additional properties and
writing out a summary of the results - the Analysis Objects Dataset or AOD.
This phase, because of the amount of computation and data required, must be
performed in batch mode, using many parallel jobs. The AOD, only a few tens of
GigaBytes in size, can them be analysed in detail interactively. In many cases,
of course, the situation is more complex, requiring recourse to the raw data,
or the further refinement of the AOD to produce a dataset small enough to be
analysed interactively. Sometimes it is necessary to visualise the event,
displaying a representation of the tracks, particles and energy distribution,
and rotating, slicing and otherwise manipulating the complex 3D picture.
However this is the exception, and the result of the interactive analysis is
generally displayed simply as a histogram or plot, showing the statistical
distribution of one or more physics values.
·
Simulation
<span
STYLE="font-weight: medium">The analysis process
takes the raw data from the detector and works back to the physics of the event
by assuming a knowledge of the detector structure, the materials properties,
the fundamental theory of the physics reactions, and also assuming that the
programme is correct. This must be checked by starting from the other end of
the process - simulating the events from first principles, from a theoretical
standpoint. The result of the collision is simulated and the particles
generated with theoretical probabilities are followed out through the detector,
decaying into other particles and interacting with the material and
electromagnetic field, to produce a simulated event: what the detector should
have seen. These simulated events are reconstructed and compared statistically
with the real events. It is necessary to generate at least as many simulated events
as there are real events collected. Simulation is very computation-intensive,
and will require as much processing capacity as all the rest of the analysis
chain.</span>
Computational characteristics
Above we have described briefly the main computing tasks required to acquire and analyse HEP data. The important characteristics of the computing requirements are summarised below.
·
event independence: The events are
related only statistically and so they can be processed in parallel at all
stages in the analysis chain, the results coming together only at the end when
all of the selected events have been processed. The granularity of the
parallelism can therefore be selected freely according to the characteristics
of the computing facility (number of processors, number of users, etc.).
·
modest floating point needs: The HEP
codes require a reasonable floating point performance, but much of the
computation is concerned with decision making rather than calculation. The
result is that HEP benchmarks correlate much better with the integer
performance of a processor than with the floating point performance. The
computational requirements are therefore expressed in SPECint95 units.
·
massive data storage: The event size is
modest (1-10 MBytes), but the number of events is very high and so the total
amount of data to be handled is large - PetaBytes for each experiment - too big
to be stored permanently online on disk. The master copy of the data must be
stored on tertiary storage (magnetic tape) with only the most active data
cached on disk. The tertiary storage is therefore used as a repository for
active data, in contrast to the more common application of magnetic tape as a
backup or archive medium.
·
read-mostly data: The raw data is never
changed after it has been recorded. The processed data (ESD, AOD) is generally
created as a set<span
STYLE="font-style: normal"> of files using the same
calibration data and the same version of the analysis algorithms. It is rare
that these files are subsequently updated - rather a new generation of the data
is created using refined calibration parameters, and improved algorithms. Only
a small fraction of the data must be able to be updated, namely the meta data
describing the contents of the analysis data files.</span>
·
throughput rather than performance: The
inherent parallelism of HEP data processing means that analysis is performed as
a set of jobs or transactions any or all of which can execute at the same time.
Many events are processed in parallel. The performance measure is the time it
takes to complete processing for all of the events. The throughput of the computing system is the important factor, rather
than the performance - the speed with
which an individual event is processed.
· resilience rather than availability: To maintain throughput it is important that the system is resilient to failure, that individual components do not bring down the whole system. However, the availability of the whole system is not required, as work which was executing on failed equipment can be re-scheduled to run elsewhere. Of course reliability is very important, but limited failure of components can be tolerated.
There is a significant contrast between the characteristics of HEP computing and those of the classical super-computing application with a single large application with scope for only fine-grained parallelism. HEP, taking advantage of event independence, requires High Throughput Computing rather than High Performance Computing.
Appendix 2: KNMI required resources
In this overview estimated resources required for Atmosphere Chemistry research related activities are given. Data from four earth observation satellite instruments are or will be used: GOME (ESA/ERS), GOME-II (EUMETSAT/METOP), SCIAMACHY (ESA/ENVISAT) and OMI (NASA/EOS-CHEM). All satellites fly in near polar earth orbits at heights of 800 km approximately. Data from these instruments are dumped to earth located ground stations once every orbit of 100 minutes. They will be processed and archived at several institutes, including the KNMI. The data must be transported from the ground stations located at Goddard (GSFC USA), Svalbard (EUMETSAT Norway), OberPfaffenhoven (ESA DPAC Germany) to the KNMI. At present only the GOME instrument is operational. Anticipated years of launch of SCIAMACHY, OMI and GOME-II are 2001, 2003 and 2004 respectively.
The numbers mentioned are partly based on assumptions and estimates.
Data
Rates:
Instrument |
Size
(Mbyte/orbit)[14] |
Orbit
Interval (minutes) |
GOME |
15 |
100 |
GOME-II |
120 |
100 |
SCIAMACHY |
546 |
100 |
OMI |
3624 |
100 |
Computing
resources required[15]:
Instrument |
SPECfp95 |
Time
needed for orbit of data: |
GOME |
24.5 |
2000 minutes |
GOME-II |
24.5 |
16,000 minutes |
SCIAMACHY |
24.5 |
2000 minutes |
OMI |
24.5 |
320,000 minutes |
Calculated is an Ozone profile for one orbit of data.
Data
transport resources:
Instrument |
Minimal
bandwidth needed |
GOME |
|
GOME-II |
|
SCIAMACHY |
|
OMI |
5 Mbit/s |
Bandwidth is calculated from data size and time constraints. Necessary retransmissions are not part of the calculation. As only for OMI operational time constraints are known at this moment, only for OMI a required bandwidth can be calculated.
Data
storage resources required:
Instrument |
Storage
capacity/year |
Mission
duration: |
Total
capacity needed |
GOME |
76 Gbyte |
5 |
380 Gbyte |
GOME-II |
615 Gbyte |
10 |
6 Tbyte |
SCIAMACHY |
2.7 Tbyte |
5 |
13.5 Tbyte |
OMI |
18 Tbyte |
5 |
90 Tbyte |
The data storage is based on the scenario that all data is archived at the KNMI. Probably not all data will be archived at the KNMI, as other institutes archive the data as well.
ACRONYMS
ENVISAT Environmental
Satellite
EOS Earth Observing Satellite
EOS-CHEM EOS Chemistry Research Satellite
EOSDIS EOS Distributed Information System
ERS European Radar Satellite
ESA European Space Agency
EUMETSAT European Organisation for the Exploitation of Meteorological Satellites
GOME Global Ozone Monitoring Experiment
GSFC Goddard Space Flight Centre (EOSDIS node)
KNMI Koninklijk Nederlands Meteorologisch Instituut
METOP Meteorological Operational Satellite
NASA National Aeronautics and Space Administration
OMI Ozone Monitoring Instrument
SCIAMACHY Scanning Imaging Absorption Spectrometer for Atmospheric Cartography
[1] NIKHEF, Netherlands
[2] IN2P3, France
[3] IFAE, Spain
[4] CERN
[5] INFN, Italy
[6] RAL, UK
[7] Eφtvφs Lorαnd University, Hungary
[8] The GRID, Ian Foster & Carl Kesselman, Morgan Kaufmann Publishers, Inc., San Francisco, 1999
[9] MONARC see http://www.cern.ch/MONARC/
[10] HEPCCC The High Energy Physics Computing Coordination Committee
[11] The list includes for the time being: GRIDware, Siemens, Storage Tek.
[12] See Appendix 1 for a summary of HEP computing characteristics
[13] SPECint 95: A modern PC has a performance of roughly 30 SPECint95s. We expect this to grow to around 200 SPECint95s by 2006.
[14] Aggregated number of level 0, 1 and 2 data.
[15] Based on the R10000 250 MHz MIPS processor from SGI (SPECfp95).