|
| ||||||||||||||||
|
| ||||||||||||||||
| ||||||||||||||||
Status of ECMWF data at the BADCA complete set of ECMWF initialised/3D-VAR
analyses from 1979 to present day is now
available from the BADC. The data is taken from
the ERA archive, which runs from January 1979
to February 1994, and the ECMWF operational
archive from March 1994 onwards. The resolution
of the data is T106/L31/6hourly which is full
resolution during the ERA period but is spectrally
truncated from T213 in the horizontal during the
operational period. This dataset will be kept up to
date on a monthly basis, the previous month being
available by about mid-month. In addition, the
uninitialised ERA dataset is nearly complete -
only 1980 has yet to be transferred. There is also
a set of forecast surface fields available for the
ERA period. This dataset contains fields such as
rainfall, radiation and surface fluxes. If you are
interested in using any of this data you should
contact the BADC Such a consistent dataset over this long period will be invaluable for investigating structures on different timescales. An illustration of this can be gleaned from Figs. 31 & 32 which are time series from this dataset showing equatorial zonal means from 1000hPa to 10hPa from 1979 to 1995 for temperature and zonal wind. We can see the annual temperature oscillation at 100hPa and the stratospheric quasi- biennial oscillation in the zonal wind. Paul Berrisford and
Dingmin Li SUMO (Software for Unified Model Output)Over the past year or so UGAMP has been making increasing use of the Meteorological Office's (UKMO) Unified Model (UM). The UM contains a reasonably comprehensive diagnostic package that is executed (as part of the model integration) at model runtime; however, this methodology means that the user needs to specify his/her diagnostic requirements before the results of the integration are known. In a research environment, where the diagnostics required may need to be determined as a function of the model results, it is often necessary (indeed, advisable) to analyse model output in a series of diagnostic jobs, each job being dependent upon the outcome of the previous job(s). With the UM, this would mean a rerun of the integration Ñ something which UGAMP cannot afford to do. Following on from the success of UMAP it seemed appropriate to develop a parallel package (now called SUMO). The first stages of this were fairly trivial - SUMO is essentially very similar to UMAP (as those who have used both user-interfaces will testify) and so the main change was to remove the GRIB input routines and replace with routines (some developed by Jeff Cole) to handle UM-format history files. Fig. 33 shows the current capabilities of SUMO - note that currently input must be specified on pressure surfaces (which means that the assimilated datasets of the UKMO-UARS project can also be analysed). It is intended that the next major release of SUMO will include the ability to accept data on model levels. Portability aspectsSUMO is essentially a portable code (although there are simple, minor changes which need to be made when moving from one machine to another). It currently runs on the RAL J90 and on Sun workstations at Reading. A test version has also been installed on a DEC-Alpha at Oxford. The main problem currently with SUMO when it is run on non-Cray machines is that I/O requirements makes it quite slow, especially if data needs to be passed around a local network. This is because (like UMAP) SUMO works with blocks of diagnostics (e.g. TF, SG, OD) - every diagnostic field (usually 3-D) within any requested block being computed and stored on disk for later use (time-averaging and output). With long time-period diagnostics (or with high resolution model output) this can result in large amounts of I/O to and from disk. Consequently, I am currently testing new updates within UMAP (code that will be inserted soon into SUMO) which will enable UMAP to determine which individual fields need to be computed and/or written to disk. While this will mean another level of complexity for those whose modify UMAP/SUMO for their own purposes, it does mean that I/O can be substantially reduced. For example, using these new updates, a sample test job requiring only a few SG (3-D upper air output) fields as output resulted in the larger disk files being reduced in size by about 70 per cent, and the amount of data being written to disk being cut by just over 60 per cent. As well as speeding up SUMO, these ideas will also be incorporated into a portable version of UMAP which may be developed during 1997. News of further developments will be circulated soon. PACKAGE-XPACKAGE-X (see Fig. 34) is a general-purpose interpolation/truncation program being developed for use, initially, with ECMWF-type GRIB data. Currently, PACKAGE-X will do the following operations on ERA and UGCM datasets: 1. Spectral to spectral truncation (or expansion) (uses a cut-off wavenumber approach) 2. Gaussian grid to full/reduced Gaussian grid (bi-linear interpolation or area-weighted averaging) 3. Gaussian grid to regular lat.long grid (bi-linear interpolation or area-weighted averaging) 4. Spectral to gridpoint (full/reduced Gaussian or regular lat.long) conversion 5. Regular lat/long gridded data to Gaussian or regular lat/long interpolation. It is possible to set selected precipitation field values to zero at gridpoints where the interpolated precipitation value is less than some user-specified value. In options 2. and 3. the land-sea mask is set to one of 0 or 1 at each gridpoint. The program will operate on a series of datafiles separated in time by a constant time-interval (as per UMAP). It will not compute new fields (so U and V cannot be computed from vorticity and divergence, for example). The program will run interactively, user-requirements are specified via namelists. Currently, output data must be generated over the entire globe, with the latitude distribution being symmetric about the equator. An equatorial output latitude can be included, so the number of output latitudes may be odd or even. Output is in GRIB format - the range of output formats will be increased later. I am also contemplating some form of regional (non-global) sub-sampling, using software called EZGET. So, if youÕd like to perform non-T106 ERA
diagnostics, or youÕd like to try out the package,
please contact me. Documentation can be found at XconvA new data conversion software package, Xconv, is being developed, initially for Unified Model data and UKMO PP format data. Xconv is a user friendly package with an X-windows interface. At present it will convert the input data into DRS format (for the VCS package), it allows the user to subsample the data in the vertical and time dimensions, finally Xconv allows the user to view the data values or have a block fill plot of the data. Xconv is still in the early stages of development and, depending on demand, will be extended along the following lines: 1. Extra output formats will be added, e.g. UTF, NetCDF, unformatted and formatted data. 2. Extra input formats will be added, e.g. GRIB (including ECMWF ERA data), DRS, UTF, NetCDF, unformatted and formatted data. 3. Subsample data in horizontal dimensions. 4. Mean data over any dimension. 5. Implement data transformations, e.g. spectral to grid point data, interpolate UM data to alternative UM grids. Xconv is written using Tcl/Tk and a mixture of C and Fortran code, it currently runs on Sun workstations. Porting to other Unix workstations with Tcl/Tk installed should be straightforward. If anybody is interested in installing a copy of this code, contact me via jeff@met.rdg.ac.uk. Parallelization of SLIMCAT modelThe parallelization of the SLIMCAT model on the Edinburgh Cray T3D is nearing completion. Two advection schemes are available, the original Prather scheme and a semi-lagrangian transport scheme taken from the ECMWF's IFS forecast model. The semi-lagrangian scheme has the advantage that it is far less memory intensive than the Prather scheme, since the latter requires knowledge of the first and second order moments of each field (10 values compared with just 1 for semi-lagrangian transport). Full chemistry is available with either scheme. Final tests are being undertaken and the program will be available for general use shortly. Timings in seconds for the semi-lagrangian scheme at a resolution of T10 with 11 levels and full chemistry are shown below.
Work is underway to port the codes to the Hitachi SR2201 parallel machine in Cambridge and they should be available in the near future. |
The Future of High Performance Computing (HPC) in the UKHPC97, the project to procure new UK academic research high performance computing facilities, is a rather inappropriate name now it has been delayed and so it may well become HPC98 or even HPC9X! This project, to spend ~£10M on high performance computing, has been delayed by administrative and political issues. This is the first HPC procurement which has not been managed directly by the Office of Science and Technology (OST), who manage the research councils. The money for HPC has now been devolved directly to the research councils in proportion to their size, for example EPSRC has ~£7.5M and NERC ~£2.3M. The research councils therefore had to agree to operate collectively to procure national HPC facilities. All the research councils have now agreed apart from the Medical Research Council, who will manage their own facilities, and the Quantum Chromodynamics community within PPARC (Particle Physics and Astronomy Research Council), who have recently bought their own Cray T3E. The research councils also had to put in place a HPC management structure which they felt would protect their interests. All this has taken far longer than expected especially as it coincided with some major internal restructuring within the research councils. The next major delay has been the Governments requirement to consider the possibility of procuring the national HPC facilities using the Private Finance Initiative (PFI). PFI is meant to transfer to the private sector the risk involved in the national HPC service (both the procurement and the service provision), to reduce the capital investment to an annual sum and to provide a long term contract for the service (at least for 7 years). A major hurdle has been to find a way of providing a well defined, long term, HPC service requirement for UK science. The level of confidence in such a long term statement of UK HPC scientific requirements is too low for a contractual commitment. It is also proving very difficult to find a mechanism which effectively transfers the perceived risk in academic HPC provision to the private sector. Discussions are still in progress between the OST, the potential service providers (RAL, EPCC, MCC, etc.), the vendors and the research councils. Some input has also been sought from the UK science community and so I have been actively involved in these discussions on UGAMPÕs behalf. To enable UK scientists to keep computing until the new procurement provides us with more, much needed, HPC resources, some money has been made available to provide upgrades to the current HPC facilities. EPCCÕs contract for the Cray T3D service has been extended for 2 more years and there are plans to enhance the EPCC J90 to improve the service. RALÕs contract for the J90 service is on an annual basis but there is hope that this will be extended so that the RAL J90 service can also be upgraded. Discussions of possible RAL J90 upgrades are in progress and there are many ideas; increase the J90 memory, supplement the J90 with an additional vector or scalar service, even replace the J90! UGAMP's current allocation on the RAL J90 of 15000 YMP CPU hours runs out on the 31st March 1997 and I am now negotiating for an allocation of at least 20000 YMP CPU hours. This new allocation will, of course, depend on the planned upgrades for the RAL J90. If you have any thoughts or ideas about the procurement or the possible interim upgrades then please contact me. Lois Steenman-Clark, UGAMP Supercomputing Coordinator IFS 15r1IntroductionThis new climate version of IFS is the first fully portable version which can run on workstations, vector shared memory machines, such as the RAL J90, and distributed memory machines, such as the Cray T3D at Edinburgh. The main changes introduced since IFS 13r4 are technical:
There are also several scientific changes, which are mainly bug fixes, also modification of the continuity equation to solve problems of orographic resonance and changes to radiation time interpolation. Fortran90The change from Fortran 77 to Fortran 90 is not as major as feared, since many of the Cray extensions are part of the Fortran 90 standard, such as variable names longer than 6 characters, automatic arrays, namelists etc. Other changes to achieve portability were to remove all Cray extensions such as BUFFERIN/BUFFEROUT and replace them by calls to code written in C. The major change is from Cray pointers to Fortran 90 pointers which are quite different. The only problem is that if a pointer array is passed as an argument then to retain the multitasking capability on the Cray J90 an interface block needs to be provided so many of the IFS15r1 subroutines now have interface blocks. IFS 15r1 compiles and runs under many different Fortran 90 compilers and IFS 15r1 works multitasking on the RAL J90 as before with the same excellent performance. Parallel VersionOur specific interest in IFS15r1 was initially to explore the potential of the Edinburgh Cray T3D. Distributed memory machines like the Cray T3D consist of a number of microprocessors each with local memory, where each of the processors is connected by a very fast connection. There are currently two ways of programming such machines. Firstly message passing, where data is local to a processor and so any data required which resides on another processor must be copied between memories using messages. The second method, High Performance Fortran (HPF), uses compiler directives which describe the data layout of data arrays across the processors. Message passing now uses an internationally agreed standard MPI (Message Passing Interface) whereas HPF compilers are still too immature. IFS15r1 therefore uses message passing with MPI. The strategy in IFS is to isolate the extra message passing code from the calculation routines so ECMWF are hopeful "that scientists developing IFS code will only be affected to a small extent". Input/OutputAn important consideration for climate models on distributed memory machines is Input/Output (I/O). There is no agreed standard for parallel I/O yet. The Cray T3D, as one of the first high performance distributed memory machines, adopted an approach not to have the full Unix operating system on each processor, so all other Unix functions are provided by the front end, which is a 2 processor YMP. This basically means that I/O from the processors to the front end is rather slow. IFS 15r1 has strategies to attempt to use many processors when reading initial data or writing diagnostic files to and from the front end. New routines have been added that handle the parallel reading of split GRIB input files and to reshuffle the fields among processors. Lois Steenman-Clark, UGAMP Supercomputing
Coordinator Implementation of IFS 15r1 on the T3DMassively parallel distributed memory machines are a relatively new type of architecture, but it looks as if they are here to stay. This section reviews some of UGAMP's first steps with a large model code on such a machine, the 512 processor Cray T3D at EPCC, Edinburgh. IFS 15r1 is designed to work with message passing on the distributed memory architecture that such MPP machines typically have. The parallel version of the model was developed at ECMWF, and a version was given to UGAMP last year. One major task was identifying which of the 1500 routines received were redundant, and eliminating them. Over 500 were removed, which cut the compile time down to less than 6 hours! For practical purposes the model is registered with Unicos Source Manager on the T3D which uses NUPDATE to keep track of revisions to the code. EfficiencyThe first runs we performed were to test the efficiency of the model at different resolutions, and on different numbers of processors. On the whole, the speed-up is excellent, see the Fig. 35, which is an average over a day with full radiation every 8 timesteps but no I/O. The more processors used, the greater the communication costs when performing, for example, advection. A model at higher resolution will have a larger ratio of parallel computation to communication, and so will speed up more as we increase the number of processors. The physics routines are a major source of parallel computation: in particular the radiation calculations are quite intensive, but parallelize almost perfectly. It is clear that for good efficiency on large numbers of processors we require a large amount of computation. Put another way, there's no point running a small problem on a large number of processors. ChangesVarious UGAMP changes, already in IFS 13r4, have been incorporated in IFS 15r1, and give an idea of the ease or difficulty of maintaining a combined parallel and serial code. A couple are mentioned here. Clear sky radiative fluxes are calculated locally, in the sense that no communication between processors is required, since each processor holds whole columns of data. In this case the only difficulties in implementation were due to the restructuring of the radiation computations since IFS 13r4. The general rule is that local changes to the physics routines should be reasonably straightforward. The global mass correction is currently being implemented. This is rather harder as it involves taking a global average, which necessitates communication between processors. It also needs to work in serial and multitasked configurations! Changes of this nature are more difficult, and there are issues to consider such as reproducibility of the results when taking the global average. I/O issuesOur consistent experience with the T3D is that the I/O is a major cause of inefficiency, mainly due to the architecture of the machine. For example, in a recent T42L31 semi-Lagrangian run on 128 processors the standard model spent almost 60% of its time just doing history dumps to disk. By means of a cunning shell script running asynchronously on the front-end machine, doing some file management, we managed to reduce that to 40% of the total time. The eventual aim is to reduce this figure to about 20%, which would double the overall speed. In this configuration we would be able to model about 40 days per wallclock hour on 128 processors, and perhaps 60 or more on 256 processors. ConclusionAll these issues have been explored on the Edinburgh Cray T3D while preparing to run the first climate run using IFS15r1. After an initial control run it is planned to integrate the changes to the surface and boundary layer scheme which were introduced to alleviate the model near-surface/surface cold bias in winter and in other stable conditions. Useful WWW sitesThe original (pre-cleaned) source code can be viewed at: http://www.met.rdg.ac.uk/ifs/ifs15r1/main.html Details of changes made so far can be seen at | |||||||||||||||
|
© 1997 Centre for Atmospheric Science/UGAMP. All scientific articles are unpublished. No text or graphics may be copied or used without permisson. Newsletter Editor: Glenn Carver, Cambridge University. |
||||||||||||||||