Fortran Futures '98
Subject: Fortran Futures '98
From: Ian D. Chivers (I.CHIVERS@kcl.ac.uk)
Date: Thu 21 May 1998 - 15:19:14 BST
As some of you know Chuck Koelbel was an invited speaker. He made a summary and agreed
that I could forward to the list. I hope you find it interesting.
On Sun, 17 May 1998 21:36:01 +0100 Chuck Koelbel <chk@cs.rice.edu>
wrote:
Hello, all -
I wanted to thank you again for inviting me to give the keynote at Fortran
Futures '98. It was a great conference, and I enjoyed the trip immensely.
And, as you can tell from the trip report below, I thought the talks were
well worth-while.
A couple loose ends to tie up:
...
a bit deleted
...
Good luck with Fortran Futures 2000!
Chuck
Fortran Futures '98 - May 14-15, 1998
Notes taken by Chuck Koelbel
Executive Summary
This is a biannual meeting organized by NAG, Ltd. to promote Fortran,
particularly new developments relevant to the language. The program
included many people who have been involved in major developments in the
language, both informal extensions (myself) and formal standards (all of
the other keynoters). In a marked departure from your average keynote, some
of the speakers proposed that Fortran (the subject of the conference) was
losing popularity and importance, that international standards did more
harm than good, and that Fortran needed significant extensions for new
computing environments. Some of the technical contributions were even less
wary of provoking controversy. Overall, though, the mood of the conference
was upbeat about the future of some form of Fortran, if only due to the
wealth and efficiency of libraries written in that language. Some
highlights of the program included:
• Good overviews of current high performance computing applications from
Koelbel (the opening keynote) and Hey (the closing keynote). I don't say
this just out of vanity; I got a lot of genuine compliments about the
American programs, and Tony's talk was a really nice complement of European
projects.
• Informal extensions to Fortran, including Co-Array Fortran, HPF, OpenMP,
and Fortran bindings to MPI. All had excellent talks describing or using
them. The suggestion was made many times that this was the way that the
language should evolve, rather than being bound up in slow-moving standards
committees. On the other hand, the observation was made many other times
that large users were loath to move to a non-standard system for fear of
non-portability.
• Fortran 2000. Several talks previewed features for the language, along
with reports on current standardization progress. The good news is, there
will be significant new features in the next Fortran standard ("which I
hope to see before I die", as one speaker put it); these include
object-oriented features, floating-point exception handling, asynchronous
I/O, and C interoperability. The bad news is that it will be some time
before practical compilers are available ("Fortran 2002" as one speaker
referred to it, and he was just pointing to the finalization of the spec).
It is clear that Fortran will have a place in scientific computing for the
foreseeable future, and not only for legacy code. Libraries are being
expanded, and Fortran 90 interfaces are being added to existing ones. While
Fortran may not be the hottest language for new students, there is little
reason to think that it will fade into the background.
Detailed Notes
Charles Koelbel, Rice University
Opening Keynote - "The Language of the Year 2000: Will it Really be Fortran?"
"There will be very little hard technical information in this talk."
Instead, the idea was to look at the forces driving any language for the
foreseeable future. First, an overview of the state of Fortran in 1998
showed that it is alive and well as a language. Most people think of
FORTRAN 77, which has been enormously successful due to its culture of
highly optimizing compilers and other tools; unfortunately, its public
relations are poor because it is seen as old-fashioned. Fortran 90 and 95
are great improvements and are becoming more accepted due to better user
convenience; Fortran 2000 promises to include object-oriented features as
part of a modern language. Next there was an overview of current hardware,
including workstations (now faster than 1970's supercomputers), parallel
hardware (both tightly-integrated systems and "piles of PCs"), and
distributed computational grids (map courtesy of the National Computational
Science Alliance). The point is that any language will have to take into
account highly parallel systems with very deep memory hierarchies. More
interesting, I hope, was the description of current programs that will
drive language evolution in the near future. Three programs were
highlighted: the Advanced Strategic Computation Initiative program in the
US Department of Energy, the US Department of Defense HPC Modernization
Program, and the US National Science Foundation Partnerships for Advanced
Computational Infrastructure program. "I'll get to the software programs in
a second." The point is that these programs are driving new application
development, and thus indirectly the evolution of computational science.
Two example applications from each were presented, including "traditional"
hard-science applications (e.g. an adaptive mesh code for cosmology from
NCSA) and leading-edge meta-applications (e.g. a simulation of a
100,000-vehicle battle requiring 13 supercomputers at DOD). The bottom line
was that "these are not your father's programs"; they require new features
including handling of large I/O, interoperability between programs and
machines, and optimizations (automatic or manual) for memory hierarchies.
In summary, "reports of the death of Fortran have been greatly
exaggerated", but Fortran will need to continue its evolution to adapt to
new requirements and needs. In particular, Fortran needs to retain
efficiency for scientific codes, provide interoperability with other
languages, and allow linking to new environments.
Sven Hammarling, NAG
"Fortran Library Activities"
"NAG started as a collaborative effort, and that tradition continues to
this day. Much of this talk is intended to highlight that collaboration."
Started with a short history of (numerical) libraries; highlight was the
note that 1998 was the 25th anniversary of the BLAS. The most important
aspect of the libraries was the exploitation of memory hierarchies. He
particularly emphasized linear algebra such as the BLAS Technical Forum
(bringing those Basic Linear Algebra Subroutines into modern software
practice). One of the current directions is a Fortran 95 interface to BLAS
that eliminates many of the sources of overhead in calling BLAS from
Fortran 90 interfaces. LAPACK is another freely-available library that NAG
has been involved in since its inception; their interest is to make it
available so they can use it as a basis for their own work. A Fortran 90
interface is in process there as well, mainly to the driver routines
(instead of all the low-level ones). ScaLAPACK ports LAPACK to distributed
memory environments using message passing. Many new developments are
underway here as well, including out-of-core solvers and an HPF interface.
NAG Fortran Libraries are under active development. The FORTRAN 77 library
is now at Mark 18, going to Mark 19 very soon, including many new routines
for sparse systems (linear programming, eigensolvers, and iterative
solvers). "We've had many requests for solvers for the Black-Sholes
equation." The SMP library builds on the FORTRAN 77 one, even beating
vendor libraries (e.g. SGI) in some cases. The Fortran 90 and Parallel
(MPI) libraries are at lower release numbers, but continue to build on the
lower-level routines.
Miles Ellis, Oxford University
Keynote - "Is There a Role for Standards in the Future of Fortran?"
"I want to pose the question of standards, because it has been an integral
part of Fortran for its entire history." A quick history of Fortran's
history followed, including its inception on the IBM 704, the first
portable Fortran version (FORTRAN IV), the first Fortran standard (FORTRAN
66), and the first international standard (FORTRAN 77, adopted by ISO in
1980). "At that point, things began to stop." However, Fortran 90 was
adopted after a long trip through a tunnel (metaphorically and really, as a
home video showed). 1994 saw additional features; 1997 gave us Fortran 95;
1998 will add conditional compilation. At this point, the question is "Does
anybody care?" (particularly in regards to the technical reports). Work on
Fortran 2000 started with requirements definitions in 1995, with an
expected final specification in 2001 (before the final approval in 2003 or
so); compilers will appear several years after the spec. Moreover, a good
idea that comes out at this meeting will make it into a standard-conforming
compiler in around 2011. Several other examples of language standards
(BASIC, Z, and Java) show the disadvantage of imposing an international
process on language design with its braking effect. He asked ("as convener
of WG5, I'm allowed to ask myself") if we should break away from
international standards efforts to a more market-oriented process. "I am
afraid that that very process may be killing the language." Rapid
developments in the technical community seem to indicate that a faster
response time is needed. Unfortunately, he doesn't have a clearly better
alternative. Compiler vendors have been less willing to add features today,
due to fear of incompatibility with oncoming standards.
This talk actually provoked some interaction (as it was intended to).
Points from the audience included the need for portability, which is
essentially only possible through standards of some sort. Another problem
is that due to the process rules allow one person to hold up the entire
process. For example, the German representative to Fortran 90 held up the
standard unless it included varying-length strings, which have not been
implemented by any vendor (including German companies). On the flip side,
it's very difficult for a single vendor to introduce a genuinely useful
feature due to the drag in the process. The real concern is the time to get
the standardization done, particularly for new features. "It's really
absurd that we now have part 3 of the standard [conditional compilation]
whose major purpose is to allow you to have a standard way to include
nonstandard features." HPF and VMEbus were mentioned as examples of
standards that were decided in short periods of time by consortia.
John Reid, Rutherford Appleton Laboratory
"Fortran 95 and IEEE Technical Report"
"I'm actually going to give two talks, one about Fortran 95 and one about
exception handling."
Fortran 95 is a minor revision of Fortran 90, with a major Fortran revision
planned for 2000. Compilers for Fortran 95 are beginning to appear; for
example, NAG has one in beta test now. The new features are FORALL (from
HPF), PURE and elemental procedures, the ELSEWHERE construct (similar to
ELSEIF), NULL initialization for pointers, automatic object initialization,
extended specification expressions (including the use of user-defined PURE
functions), CPU_TIME ("very carefully contrived wording so that the vendor
can do whatever they want"), IEEE compliance (signed zero, which mostly
comes up in underflowing computations). Some FORTRAN 77 features were
declared obsolescent, including computed GOTO, statement functions, DATA
statements among executables, assumed-length character functions, CHARACTER
*, and fixed source form. (Note that last one!) Some features were deleted
entirely, including ASSIGN, noninteger DO indices, and the H edit
descriptor. The new features are fairly intuitive, and familiar from
various previous practice and dialects.
Exception handling is a "type 2 technical report", which is a type of
standards document with less bureaucracy. Although officially an extension
to the existing standard (in this case F95), the expectation is that it
will be adopted in the next standard (F2000) unless major discrepancies are
discovered. Exceptions have a long and complex history in Fortran, reaching
back to proposed additions to Fortran 8X; the current (finally approved)
form uses procedures to process floating-point exceptions. The "obvious"
modern alternative, adding a general exception handling mechanism, was
considered and abandoned several times due to lack of consensus on the
nitty-gritty details of the construct (e.g. the effect if an exception were
raised in a called subroutine) and implementation details (e.g. tests
required on some hardware). Due to the strong requirements (direct support
for IEEE when the hardware implements it, while recognizing partial support
on other hardware), they had to use an intrinsic module and integrate IEEE
support into the compiler. Three levels of support are provided:
IEEE_EXCEPTIONS (only overflow and divide-by-zero), IEEE_FEATURES (all IEEE
features, which may be imported separately), and IEEE_CONTROL (manage the
IEEE features). A lot of support is provided for inquiring whether the
hardware/compiler supports various IEEE features. More details are at
https://wg5-fortran.org/N1251-N1300/N1281.pdf (in several formats).
Malcolm Cohen, NAG
"Allocatable Components Technical Report and Fortran 2000"
This is also a technical report from ISO, scheduled for inclusion in the
next suitable standard. The problem being solved is technical limitations
on derived data types in Fortran 90/95; in particular, if you want to have
a variable-sized array field you must use a POINTER, which raises
performance losses (due to potential aliasing and non-unit strides) and
safety problems (memory leaks and dangling pointers). The basic solution is
to allow derived type components (i.e. fields of structures) to be
ALLOCATABLE. There are some details to getting this to work, but the result
is fairly user-friendly. A similar fix allows dummy array arguments to be
ALLOCATABLE, which in turn allows a procedure to allocate an array for use
in its caller or elsewhere. Applying this to function results allows an
array to be returned and automatically deallocated after use.
The second talk (like Reid, he had two talks under one title) was about
Fortran 2000. Actually, he called it "Fortran 2002" in honor of its
expected completion date. The intent is to make F2000 a major upgrade to
the language (as opposed to the intentionally minimal extensions in F95).
Major features going in include the two technical reports just discussed,
improvements to I/O, interval arithmetic, data structuring improvements,
and interoperability. There are 11 minor features "finished" (in draft
form), 4 under active development, and 4 more "waiting in the wings". The
schedule now is to decide final requirements by 2/97 ("that date will not
slip any further"), completing separate feature edits by 11/98, an
integrated document by 1/00 (including resolving interactions between
features), and 4 ballots ending in 11/02. The I/O extensions are for
user-defined derived types (providing better control) and asynchronous I/O
("there are a whole lot of interactions with other parts of the standard").
Interoperability is a major feature, including internationalization (still
at the requirements stage) and C interoperability (now at the specification
stage). Interval arithmetic requires a lot of machinery such as control
over optimization, control of rounding, and better opaque types; many users
have indicated that they need these to do a full interval package. Data
structuring includes parameterized types, pointers to procedures,
initializers and finalizers, type extension (inheritance), and type-bound
procedures (dynamic dispatch and polymorphism). The last two provide the
object-oriented features of F2000, which can basically be summed up as
single inheritance, single dispatch, with run- and compile-time efficiency,
and without sacrificing necessary functionality. Type-bound procedures
extend the type, in much the same way that virtual functions in C++ do. An
important aspect of the design is that all types are statically determined.
"Are you going to be the first person to implement a Fortran 2000
compiler?" "I couldn't possibly comment." (Malcolm produced reference
implementations of the F90 and F95 languages during those standards
processes.) More information is available at the J3 website - URL wasn't
available, but one can start searching from the WG5 site mentioned above.
Bob Kuhn, Kuck and Associates
"OpenMP Workshop"
I was late to the workshop, having gotten carried away by a hallway
conversation. But it turned out that all I missed was the description of
features and some of the advertising.
I did get there for the description of coming attractions in OpenMP.
Various vendors are working on products, and the OpenMP Architecture Review
Board is creating a validation suite and working on new features. OpenMP is
working on extending to C and C++; a draft document is almost ready for
release. There is a web site at http://www.openmp.org/ with more
information.
Steve talked about the use of OpenMP in NAG. They like OpenMP because it
provides portability and they re a third-party software vendor. Orphaned
directives are important because they allow modularity. (The idea is that
parallelism directives may occur at different levels of the program, and it
works the way you want.) Also, they like the ability to optimize for
single-processor performance easily. (This is hard with other parallel
languages/systems that do a lot of transformations behind the scenes.)
More descriptions of OpenMP applications came from Bob. First he described
some exercises in parallelizing codes (originally from SpecFP) that were
not amenable to automatic parallelization. One optimization was to avoid
barriers and cache pollution by inserting a parallel region outside of a
parallel DO. The REDUCTION directive handled a continued summation. Dynamic
scheduling handled a loop with conditionals (a special case in the first
iteration) efficiently. Conditional compilation handled a loop with only a
few iterations that would otherwise have been inefficient to parallelize.
THREADPRIVATE was needed for common blocks used as temporaries ("passed"
between several subroutines in the same thread, but not used to accumulate
values between threads). One note was that they recommend setting the
default behavior as shared/private in every program, since this is a
critical, often tricky, issue.
Their view of the steps in parallelizing an (existing) application is
• Analyze
• Restructure
• Test
• Improve
• Quality Assurance
There are tools for each step of the process:
• KAP for analyzing and restructuring. People in some fields (he mentioned
CFD) are still using automatic parallelization, so KAP now produces OpenMP
directives. In effect, they're either using the automatic system as a first
pass or using it to improve/clean up initial parallel directives they put
in.
• Guide (from KAI) helps in a loop between restructuring, testing, and
improving. It collects runtime data that can be fed back into KAP or used
by a special runtime system for tuning.
• Digital's OpenMP compiler has a number of environment variables (e.g.
OMP_NUMTHREADS, MP_SPIN_COUNT) to tune the application. It and other OpenMP
implementations also offer optional runtime correctness checks, including
checks for interference between threads.
• Assure (from KAI) and DBX are parallel debuggers that have OpenMP
interfaces. Ladebug (from Digital) is OpenMP-aware also, in the sense that
it can stop at subroutine calls associated with (inserted by) OpenMP
directives. "As you use the symbolic debuggers you pretty quickly figure
out which are the work threads and which are the monitor threads." Assure
catches inconsistencies between threads, checks thread stack sizes for
overflow, and detecting race conditions. Assure has been used on real
programs (OVERFLOW, LS-DYNA) and successfully discovered bugs that the
original developers were unaware of.
A long digression popped up about this time about the possibility of
private stack overflows. The bottom line is that an appropriately
aggressive user can break OpenMP.
• GuideView (from KAI) instruments the program with performance monitors.
The case he showed illustrated had serious overhead from
over-parallelization (8 threads, 4 processors), visible as increased
waiting and sequential time due to spin locks.
Alistair Mills, SGI
"Fortran Fun"
"I don't know what I'm doing here. Once in a moment of weakness I agreed to
talk about some of the things you can do with Fortran." This was not a
serious talk...
- Principia Fortranica (ala Newton):
- First Law: An object at rest will remain at rest unless acted on by an outside
force, or an object in motion will remain in motion unless acted on by an outside force
- Fortran does not change unless a committee acts on it, or Fortran will
continue forever if there is no intervention
- Second Law: Force = Mass * Acceleration
- Fortran = Mathematical * Accuracy
- Third Law: Every action has an equal and opposite reaction
- Every program for computer type X has an equal program on computer type Y
- The Fortran Sonnet Form:
- 14 lines, 3 groups of 4 lines, and one of 2, first group declares the data,
second and third analyze the data, last finalizes the data, with a rhyming scheme
- I've got to get this slide...
- And a Fortran crossword puzzle, with a prize for the first solution
Panel Discussion
"Advantages and Disadvantages of Fortran 2000"
What communities are driving Fortran standardization now?
Miles Ellis:
"There's a large element of sheer inertia." More seriously, mostly guided
by high performance people. An example is interval arithmetic; numerical
people pushed it and Sun is working hard on it, but it's less obvious that
programmers will really make use of it. John Reid brought up the question
(again) of how useful the international standards process is. It's possible
that interval arithmetic will really be as important as its proponents say
it is, but it's less clear that ISO certification of this procedure is
required for the feature to succeed.
Is Fortran losing its way in adding more complex features, particularly
losing efficiency in adding object-oriented features? Some scientific users
need the OO features now. Metcalf noted that CERN supported his involvement
in Fortran 8x (and 90 and...) based on the need for object oriented
features. The high-energy community has taken an irrevocable move to C++
because it took too long to get objects. In fact, users directly suggested
many advanced features; the hard part is to know how serious some of the
interest is.
Might there be room for officially-supported subsets of Fortran (e.g. ELF
and F)? "Three of us on this table would like to think so, because we'll
sell more books." In some sense, though, it doesn't matter if the subset is
stamped official or not.
Thinking back to the days of F77, and some vendors were downright hostile
to the idea of a new standard... Is there a different attitude today? "Yes"
(not quite simultaneous, but no disagreements). Compilers are appearing at
reasonable rates, and vendors are working in good faith with the standards
committees.
Interestingly, most of the non-vendors on the standards committees are now
from the academic world, rather than "real" (i.e. very big) users like
CERN. Companies and labs can't (or won't) see the financial advantage
there. Portability is not enforced by managers, although it clearly should
be given that hardware is likely to change before the software is finished.
Why, if the users have so much input to the committees, did it take Fortran
90 so long to be accepted? Inertia and portability (people aren't willing
to take the first move to the new standard). Publicity is also a problem;
computing services aren't teaching Fortran (or even supplying it).
Physicists don't know that the new features are now out there. "So the
marketing is very bad?" "And it's being marketed to the services, not the
users."
Is backward compatibility really that important? When they tried to
deprecate COMMON/EQUIVALENCE, they were met with howls of protest. Having
F77 as a subset is a good thing, but what's really needed are some tools to
clean up code and remove obsolescent features. (VAST does this, but isn't
stable enough.)
For programs that have been (could be) written in Fortran 90 and C++, is
there a productivity difference? Don't know of any systematic studies.
What to do about lack of knowledge of F90? "Any organization that is to
survive in the long term must spend some resources in keeping their
employees up to date." For the questioner's organization, software is not a
main-line issue - they're working on physics, and only learn new languages
only when they need to. MATLAB is used more than Fortran, because the
engineers are more familiar with it. Other engineering schools now teach
spreadsheets rather than Fortran. Existing users move only gradually to
F90, and modern structured features (e.g. MODULE) tend to be the last they
consider.
Richard Field, Edinburgh University
Keynote - "Fortran in Education"
"I will talk some about education, but not so much as Tony Blair." "Our
university is 413 years old, which I think means that it is even older than
Fortran." Fortran's history is "long and honorable"; the question is how it
will go on. Current searches on the world-wide web show 287,580 documents
on "Fortran", 21253 on "Fortran 95", 48660 on "HPF" ("I don't know quite
what that means"), compared to 6000 on ALGOL, 666000 on Pascal (possibly
including discussions on Blaise Pascal), and 4.6 million on UNIX. So there
is still some interest in the language, albeit not so much as some other
subjects. The state of Fortran teaching in Edinburgh is that it is taught
in engineering, but this could change. England emphasizes C and C++; this
is seen with alarm in Scotland, where C is considered unsafe. The US "has
no programming in its undergraduate curriculum". Parallel Fortran (mainly
HPF) is well-represented in education, as several web pages from EPCC
showed. Other educational efforts come from other sources, including NAG's
installations at educational institutions. Conclusions:
• Fortran lives, but for how long?
• HPF "injection" into Fortran world is good, but watch for paradigm shifts.
• Some shifts are good, e.g. the steam engine to the internal combustion engine.
• Constant re-invention: a strength or a weakness?
Several questions from the audience highlighted some (more) hopeful signs
in teaching Fortran, including the many new Fortran 90 texts that are
selling well.
Anthony Colyandro, Visual Numerics
"Hybrid Distributed and Shared Memory Parallelization Tools from Visual
Numerics"
What he was actually talking about was their implementation of Fortran
exception handling in their high-performance distributed network Fortran
library, or "What Visual Numerics Is Doing to Make High Performance
Computing More Accessible". Their metrics for designing a parallel library
are to provide customizable control of master/slave nodes, automatic
scalability to computational resources (by runtime monitoring and dynamic
scheduling), capability for homogenous and heterogeneous computation (via
MPI), and a comprehensive library. They are particularly proud of the work
they've done on the "art, science, and mystery" of error handling in their
routines, done via a subset implementation of the floating point exception
handling TR. (It has to be a subset, since most compilers don't yet handle
intrinsic modules.) This is done in IMSL-DNFL (IMSL Distributed Network
Fortran Library), an enhanced mathematical/statistical library written in
F90 and MPI. Basically, it's the same material described in the IEEE talk
the day before, with the addition of a few enumerated types. A few examples
of templates for using these exceptions were also given, with the expected
advantages in code clarity. Note that Fortran exceptions are not precise;
that is, some machines may not set the flags immediately due to pipelining
or parallelism. However, one can still test the exception flags later to
get a decent idea of what happened (at least better than the old method of
dying with a short message). More detailed information on what they've done
was available at http://www.vni.com/books/index.html.
Christele Faure, INRIA
"The Odyssee Automatic Differentiation Tool"
Started with a general introduction to automatic differentiation, used to
compute derivatives of functions computed from source code. For those who
haven't seen automatic differentiation before, the gist is to apply the
chain rule repeatedly to (exact) derivatives of the components of the
computation. https://www.mcs.anl.gov/research/projects/autodiff/tech_reports.html gives a list
of available tools for doing this. Her tool (Odyssee) is a source-to-source
translator for Fortran 77, computing tangents and cotangents in forward and
reverse mode respectively. An example of applying Odyssee to two codes
(Thyc-1D and Thyc-3D, both thermohydrodynamics codes) was given. Thyc-1D
was a small (2000-line) required three gradients, while Thyc-3D was a
production system (60000 lines) requiring only one gradient. They were
using a hybrid forward/reverse mode approach that I didn't quite catch, but
it required writing checkpoints periodically, computing reverse-mode
derivatives at each checkpoint and running in forward mode between the
checkpoints. Choosing the checkpoint frequency was a space-time tradeoff
that they were still working on. Numerical results showed some differences,
perhaps due to orders of evaluation, between forward mode, reverse mode,
and divided differences. More info is available at
http://www.ens.utulsa.edu/~diaz/cs8243/odyssee.html.
John Pryce, Cranfield University
"A Fortran 90 Automatic Differentiation Package AD01"
"I have the mandatory introduction to what automatic differentiation is..."
One unique feature of AD01 is that it is not limited to first and second
derivatives "if you have world enough and time". The advantage of F90 as an
input and target language is that it allows derived types, which AD01 uses
to represent the (function, derivative) doublets. Making appropriate
changes to the declarations, they can avoid remapping function names and
other complex but uninteresting transformations. Similarly, for simple
functions the function source does not change at all (although declarations
and initializations must be added). They support all the obvious binary and
unary functions applied to the AD doublet type through operator
overloading. Sparsity is handled automatically. For backward mode, the
operators build the operation list/computation graph/tape and the extract
operation (invoked at the end) does the computational work. Three
performance tests (Rosenbrock function, Elastic-plastic torsion, and CFD
kernel) show the effects of optimizations. They improved generated code
speed by a factor of 2 to 10 by replacing array assignments by DO loops
(avoiding array temporaries) and better sparse array data structures. Most
interesting was the CFD test; finite differences ran much faster for the
derivatives but the overall system converged slower due to the inaccurate
approximations. AD01 did not run particularly fast, particularly compared
to the author's hand-coded exact derivatives and derivatives generated by
ADIFOR.
Doug Miles, Portland Group Inc.
"HPF or OpenMP"
"The idea here is that there's supposed to be a question mark, and I'm
going to try to answer the question." PGI has, of course, been involved
with HPF from its inception and now has HPF compilers for Cray, HP, and
Intel; more recently they've been moving into the NT and Linux markets,
including OpenMP. The presentation started with a short description of each
model: HPF's data-parallel model (data distribution and pointwise parallel
constructs) and OpenMP's multithreaded model (fork/join threads which must
synchronize explicitly). Data parallelism can be portable to serial,
shared-memory, and distributed-memory systems; the key decision in writing
the program is to focus on the data and distribute it to processors, while
the compiler takes care of distributing work to match the data and
inserting communication. Multithreading can be applied incrementally to
sections of the program by executing iterative constructs in parallel. The
OpenMP forum compares the models on several axes (supports
data-parallelism, incremental parallelization, performance oriented) and
concludes that OpenMP is most convenient (not surprisingly). Doug adds some
comparisons (use on clusters, memory hierarchy optimizations, eliminates
false sharing) where HPF is more capable than OpenMP, and a few more
(parallel I/O) where MPI is the winner over either HPF or OpenMP. To do a
head-to-head comparison, PGI uses their compilers:
• OpenMP - native auto-parallelization, directives (subset of OpenMP),
• HPF - shared-memory compilation with put/get rather than message-passing
communication
He was unable to present performance comparisons on matrix multiply because
"I'm still getting some results I don't understand, and I need to talk to
SGI before I present those in public anywhere." Instead, he compared the
FALCON reservoir simulator, which had been previously ported to HPF. On a
Compaq Pro 8000, the raw times were very close - on 4 processors HPF took
9600 sec and a threaded implementation ("not quite OpenMP") took 9900 sec.
His advice was to use OpenMP for legacy applications being ported to
small-size shared memory machines, and where dynamic load balancing would
be important. HPF was advised for running on clusters, building new
scalable applications from scratch, using an existing CM FORTRAN code, or
calling highly-optimized MPI libraries through the EXTRINSIC interface. "In
conclusion, my advice is to use HPF whenever you can." To answer the
question in the title, "HPF or OpenMP? YES."
Cos Ierotheou, University of Greenwich
"CAPTools"
Started by taking a long view of why Fortran is still used (legacy systems
and high efficiency) and whether it would survive. The "survival" bullets
were more interesting:
• Ideally, should hide as much of the parallelization process as possible,
use a parallel compiler, and generate a high-efficiency executable.
• Practically, can't hide all aspects of the parallelization process,
control of the execution is essential.
So, today's talk tries to see how much of the parallelization can be done
by a tool, either with or without user intervention. CAPTools is a tool to
do just this; it takes in Fortran ("as dirty as you like") and produces
readable parallel Fortran. They now produce MPI, but plan to produce HPF as
well (as sort of an intermediate step, since their method is essentially
data distribution driven). The structure of the system is pretty much as
one would expect if one were versed in source-to-source restructuring
systems: a set of analysis and transformation passes (dependence, data
distribution, computation partitioning, etc.) running on a program database
with user interaction. The dependence analysis is apparently fairly strong,
including interprocedural value numbering and perhaps other advanced
techniques. For data partitioning, the program requests initial
distributions (HPF distributions or those from an irregular partitioner)
for a few main arrays and propagates these to others used with the initial
ones. Computation partitioning and communication generation are done
automatically, also using interprocedural techniques. Several tests were
given, including NAS benchmarks and a significant CFD code run on several
machines. Not all the speedups were great, but all were excellent given the
very small time that humans had to put into the parallelizations.
John Reid, Rutherford Laboratory
"F--"
The language has changed its name to Co-Array Fortran, but he kept the
title consistent with the proceedings. The gist of the language, originally
developed by Bob Numrich of Cray/SGI, is an explicit SPMD model, with
identical sets of local variables on all processors ("images"). Variables
declared as co-arrays are accessible through a new set of array subscripts
delimited by []; these essentially address other processors. Processes run
asynchronously unless brought into line by an explicit barrier. Some
examples of what this leads to are
t = s[p] ! broadcast s from p
x(:)[p] = s[index(:)] ! gather
! redistribution
iz = this_image(a)
if (iz <= kz) then
do ix = 1,kx
a(ix,:) = b(:,iz)[ix]
end do
end if
The point of the last several lines is that complex redistributions (in
this case, a transpose) are relatively easy to express. The implementation
has several rules to ensure that co-arrays (those declared with []) always
have the same address within each image; knowing this, addressing is easy
and allocation is not a difficulty (given that there is a barrier
synchronization at allocation time). Explicit synchronization happens any
time that one image relies on another; this implies that it is much more
efficient to explicitly do necessary local operations explicitly and
combine the results separately. Also, intrinsic functions can only apply to
local arrays; therefore, naively written code seems to end up gathering
much global data if called with a co-array section. For procedure calls,
the ordinary rules of F95 apply and all subscripts must be consistent on
all processors; this leads to minimal (no?) new rules on managing generic
interfaces. Derived type components cannot have co-array components, but
they can have POINTER components; however, the POINTER can only point to
data in its own image. This is still enough to allow multi-level
parallelism, by declaring arrays of pointers to the local arrays. In
summary:
• Co-Array Fortran provides a very clear way to express data access between
processes
• It is applicable to both shared and distributed memory
• It is simple to implement, although it does require close collaboration
between the levels of the system
• Preliminary results from Cray are very encouraging.
Tony Hey, University of Southampton
"New Challenges for High Performance Computing and for Fortran in the 21st
Century"
Tony establish his bona fides by giving a overview of the
state of HPC at Southampton (a T3E, SP2, parallel databases, and digital
libraries); I doubt that anybody seriously questioned his abilities
beforehand, but there was no doubt afterward. But for industry,
"parallelism is not the point; cost effectiveness is what they're looking
for." Of course, there are a lot of tech transfer programs going on that
are catering to this need as well. This led to several observations:
• Challenges for HPC in industry are not the same as "Grand Challenges"
• Industry is keen to exploit existing resources (heterogeneous workstation
networks)
• Clusters have been used successfully for some compute-intensive applications
• You have to include metacomputing in high performance computing
He had an example of the Promenvir (probabilistic mechanical design
environment) of 6 sites, 50GFLOP/s aggregate power, in several countries
(Spain and Italy, at least). The application was an antenna design, where
the method was to evaluate alternatives Monte Carlo style on nondedicated
machines. It worked, and based on that success led to a new project:
probabilistic crash simulation for cars in a "simulated proving ground". It
did, however, raise some interesting issues: security, reliability of the
network, and (most importantly) site licensing for the software. A new
software economic model may well be needed in the future; this time it
wasn't a real issue, because the parent company was there to provide the
software. Other examples included effective deployment of parallel codes,
data exchange, and computational steering in the TOOLSHED project. The key
to this kind of project was embedding the (parallel) code into a design
process (with humans). The HiPSID project for simulation and interactive
design used fairly off-the-shelf solvers along with interactive feedback to
designers. The application there was turbine design, where the bottleneck
comes from the fact that there are many more designers (CAD users) than
analysts ("the wizards"). Automating the analysis is the way to break the
logjam. A new area for supercomputing is data-intensive applications for
metacomputers. Persistent object management is needed here, and it's a
challenge. "Now here's someplace where HPC can make some money." Data
mining of corporate "islands of data" is a big problem for large
organizations. The example was Unilever, who did a patent search for one of
their new inventions, only to discover prior art - a patent held by
Unilever, which had been forgotten/lost in the organization. Of course,
high performance platforms (SMPs and MPPs and DSMs) are being used more and
more for this knowledge discovery, and optimization of these applications
are "every bit as interesting and challenging as optimizing scientific
codes." One example was the Italian Financial Flows Archive, where they
mined the Bank of Italy's records to discover money laundering. Various
visualizations of the data showed interesting trends, such as the
geographical plotting of flows out of various regions ("for example, let's
consider this little island off the south of Italy..."). Another example
was the MEMOIR project, described as akin to Vannevar Bush's "memory
expander" (from his article "As We Now Think") that follows links from one
authority to another. Three challenges for Fortran are therefore:
• Shrinking Market Share, Multiple Versions: Although total numbers may
stay constant or even grow, the overall market will likely be growing
faster. Having multiple versions of the language is a real cost for ISVs,
who cannot carry too many versions.
• Higher-Level Programming Paradigms: For example, MATLAB and Mathematica
are being taught to engineers instead of Fortran. Many other high-level,
domain-specific languages are also competing for Fortran's market share.
• Distributed Object Computing and Networks: "Like it or not, that's where
we are." Distributed computing is the norm, and despite efforts to add
objects to Fortran (e.g. F2000) it will not be the language of choice. "I
assume that in time - court actions and so on - the current defects in Java
will be fixed."
Finally, a few remarks on computing visions of the future:
• Web browser as the operating system
• Higher-level programming paradigms (but are they de-skilling the process
of coding?)
• Re-engineering of old codes (legacy code is becoming increasingly
difficult to maintain)
• Finally, beyond 2010, when chips start running into the CMOS endpoint,
what will you do?
• He's looking at quantum computing
Charles Koelbel | CRPC, MS 132 |
Center for Research on Parallel Computation | Rice University |
Rice University | 6100 Main Street |
chk@cs.rice.edu | Houston, TX 77005 |
phone: 713-285-5304 | fax: 713-285-5136 |
This archive was generated by hypermail 2a23 : Wed 21 Jul 1999 - 12:43:18 BST
Comments on this or any other of the Group's pages should be sent by email to the
FSG Web Editor.