comp-fortran-90 archives - May 1998: Fortran Futures 98

Fortran Futures '98

Subject: Fortran Futures '98
From: Ian D. Chivers (I.CHIVERS@kcl.ac.uk)
Date: Thu 21 May 1998 - 15:19:14 BST

As some of you know Chuck Koelbel was an invited speaker. He made a summary and agreed that I could forward to the list. I hope you find it interesting.

On Sun, 17 May 1998 21:36:01 +0100 Chuck Koelbel <chk@cs.rice.edu>
wrote:

Hello, all -
I wanted to thank you again for inviting me to give the keynote at Fortran Futures '98. It was a great conference, and I enjoyed the trip immensely. And, as you can tell from the trip report below, I thought the talks were well worth-while.

A couple loose ends to tie up:
...
a bit deleted
...

Good luck with Fortran Futures 2000!

Chuck

Fortran Futures '98 - May 14-15, 1998

Notes taken by Chuck Koelbel

Executive Summary

This is a biannual meeting organized by NAG, Ltd. to promote Fortran, particularly new developments relevant to the language. The program included many people who have been involved in major developments in the language, both informal extensions (myself) and formal standards (all of the other keynoters). In a marked departure from your average keynote, some of the speakers proposed that Fortran (the subject of the conference) was losing popularity and importance, that international standards did more harm than good, and that Fortran needed significant extensions for new computing environments. Some of the technical contributions were even less wary of provoking controversy. Overall, though, the mood of the conference was upbeat about the future of some form of Fortran, if only due to the wealth and efficiency of libraries written in that language. Some highlights of the program included:

• Good overviews of current high performance computing applications from Koelbel (the opening keynote) and Hey (the closing keynote). I don't say this just out of vanity; I got a lot of genuine compliments about the American programs, and Tony's talk was a really nice complement of European projects.

• Informal extensions to Fortran, including Co-Array Fortran, HPF, OpenMP, and Fortran bindings to MPI. All had excellent talks describing or using them. The suggestion was made many times that this was the way that the language should evolve, rather than being bound up in slow-moving standards committees. On the other hand, the observation was made many other times that large users were loath to move to a non-standard system for fear of non-portability.

• Fortran 2000. Several talks previewed features for the language, along with reports on current standardization progress. The good news is, there will be significant new features in the next Fortran standard ("which I hope to see before I die", as one speaker put it); these include object-oriented features, floating-point exception handling, asynchronous I/O, and C interoperability. The bad news is that it will be some time before practical compilers are available ("Fortran 2002" as one speaker referred to it, and he was just pointing to the finalization of the spec). It is clear that Fortran will have a place in scientific computing for the foreseeable future, and not only for legacy code. Libraries are being expanded, and Fortran 90 interfaces are being added to existing ones. While Fortran may not be the hottest language for new students, there is little reason to think that it will fade into the background.

Detailed Notes

Charles Koelbel, Rice University
Opening Keynote - "The Language of the Year 2000: Will it Really be Fortran?"
"There will be very little hard technical information in this talk."
Instead, the idea was to look at the forces driving any language for the foreseeable future. First, an overview of the state of Fortran in 1998 showed that it is alive and well as a language. Most people think of FORTRAN 77, which has been enormously successful due to its culture of highly optimizing compilers and other tools; unfortunately, its public relations are poor because it is seen as old-fashioned. Fortran 90 and 95 are great improvements and are becoming more accepted due to better user convenience; Fortran 2000 promises to include object-oriented features as part of a modern language. Next there was an overview of current hardware, including workstations (now faster than 1970's supercomputers), parallel hardware (both tightly-integrated systems and "piles of PCs"), and distributed computational grids (map courtesy of the National Computational Science Alliance). The point is that any language will have to take into account highly parallel systems with very deep memory hierarchies. More interesting, I hope, was the description of current programs that will drive language evolution in the near future. Three programs were highlighted: the Advanced Strategic Computation Initiative program in the US Department of Energy, the US Department of Defense HPC Modernization Program, and the US National Science Foundation Partnerships for Advanced Computational Infrastructure program. "I'll get to the software programs in a second." The point is that these programs are driving new application development, and thus indirectly the evolution of computational science. Two example applications from each were presented, including "traditional" hard-science applications (e.g. an adaptive mesh code for cosmology from NCSA) and leading-edge meta-applications (e.g. a simulation of a 100,000-vehicle battle requiring 13 supercomputers at DOD). The bottom line was that "these are not your father's programs"; they require new features including handling of large I/O, interoperability between programs and machines, and optimizations (automatic or manual) for memory hierarchies. In summary, "reports of the death of Fortran have been greatly exaggerated", but Fortran will need to continue its evolution to adapt to new requirements and needs. In particular, Fortran needs to retain efficiency for scientific codes, provide interoperability with other languages, and allow linking to new environments.

Sven Hammarling, NAG
"Fortran Library Activities"
"NAG started as a collaborative effort, and that tradition continues to this day. Much of this talk is intended to highlight that collaboration." Started with a short history of (numerical) libraries; highlight was the note that 1998 was the 25th anniversary of the BLAS. The most important aspect of the libraries was the exploitation of memory hierarchies. He particularly emphasized linear algebra such as the BLAS Technical Forum (bringing those Basic Linear Algebra Subroutines into modern software practice). One of the current directions is a Fortran 95 interface to BLAS that eliminates many of the sources of overhead in calling BLAS from Fortran 90 interfaces. LAPACK is another freely-available library that NAG has been involved in since its inception; their interest is to make it available so they can use it as a basis for their own work. A Fortran 90 interface is in process there as well, mainly to the driver routines (instead of all the low-level ones). ScaLAPACK ports LAPACK to distributed memory environments using message passing. Many new developments are underway here as well, including out-of-core solvers and an HPF interface. NAG Fortran Libraries are under active development. The FORTRAN 77 library is now at Mark 18, going to Mark 19 very soon, including many new routines for sparse systems (linear programming, eigensolvers, and iterative solvers). "We've had many requests for solvers for the Black-Sholes equation." The SMP library builds on the FORTRAN 77 one, even beating vendor libraries (e.g. SGI) in some cases. The Fortran 90 and Parallel (MPI) libraries are at lower release numbers, but continue to build on the lower-level routines.

Miles Ellis, Oxford University
Keynote - "Is There a Role for Standards in the Future of Fortran?"
"I want to pose the question of standards, because it has been an integral part of Fortran for its entire history." A quick history of Fortran's history followed, including its inception on the IBM 704, the first portable Fortran version (FORTRAN IV), the first Fortran standard (FORTRAN 66), and the first international standard (FORTRAN 77, adopted by ISO in 1980). "At that point, things began to stop." However, Fortran 90 was adopted after a long trip through a tunnel (metaphorically and really, as a home video showed). 1994 saw additional features; 1997 gave us Fortran 95; 1998 will add conditional compilation. At this point, the question is "Does anybody care?" (particularly in regards to the technical reports). Work on Fortran 2000 started with requirements definitions in 1995, with an expected final specification in 2001 (before the final approval in 2003 or so); compilers will appear several years after the spec. Moreover, a good idea that comes out at this meeting will make it into a standard-conforming compiler in around 2011. Several other examples of language standards (BASIC, Z, and Java) show the disadvantage of imposing an international process on language design with its braking effect. He asked ("as convener of WG5, I'm allowed to ask myself") if we should break away from international standards efforts to a more market-oriented process. "I am afraid that that very process may be killing the language." Rapid developments in the technical community seem to indicate that a faster response time is needed. Unfortunately, he doesn't have a clearly better alternative. Compiler vendors have been less willing to add features today, due to fear of incompatibility with oncoming standards. This talk actually provoked some interaction (as it was intended to). Points from the audience included the need for portability, which is essentially only possible through standards of some sort. Another problem is that due to the process rules allow one person to hold up the entire process. For example, the German representative to Fortran 90 held up the standard unless it included varying-length strings, which have not been implemented by any vendor (including German companies). On the flip side, it's very difficult for a single vendor to introduce a genuinely useful feature due to the drag in the process. The real concern is the time to get the standardization done, particularly for new features. "It's really absurd that we now have part 3 of the standard [conditional compilation] whose major purpose is to allow you to have a standard way to include nonstandard features." HPF and VMEbus were mentioned as examples of standards that were decided in short periods of time by consortia.

John Reid, Rutherford Appleton Laboratory
"Fortran 95 and IEEE Technical Report"
"I'm actually going to give two talks, one about Fortran 95 and one about exception handling."
Fortran 95 is a minor revision of Fortran 90, with a major Fortran revision planned for 2000. Compilers for Fortran 95 are beginning to appear; for example, NAG has one in beta test now. The new features are FORALL (from HPF), PURE and elemental procedures, the ELSEWHERE construct (similar to ELSEIF), NULL initialization for pointers, automatic object initialization, extended specification expressions (including the use of user-defined PURE functions), CPU_TIME ("very carefully contrived wording so that the vendor can do whatever they want"), IEEE compliance (signed zero, which mostly comes up in underflowing computations). Some FORTRAN 77 features were declared obsolescent, including computed GOTO, statement functions, DATA statements among executables, assumed-length character functions, CHARACTER *, and fixed source form. (Note that last one!) Some features were deleted entirely, including ASSIGN, noninteger DO indices, and the H edit descriptor. The new features are fairly intuitive, and familiar from various previous practice and dialects. Exception handling is a "type 2 technical report", which is a type of standards document with less bureaucracy. Although officially an extension to the existing standard (in this case F95), the expectation is that it will be adopted in the next standard (F2000) unless major discrepancies are discovered. Exceptions have a long and complex history in Fortran, reaching back to proposed additions to Fortran 8X; the current (finally approved) form uses procedures to process floating-point exceptions. The "obvious" modern alternative, adding a general exception handling mechanism, was considered and abandoned several times due to lack of consensus on the nitty-gritty details of the construct (e.g. the effect if an exception were raised in a called subroutine) and implementation details (e.g. tests required on some hardware). Due to the strong requirements (direct support for IEEE when the hardware implements it, while recognizing partial support on other hardware), they had to use an intrinsic module and integrate IEEE support into the compiler. Three levels of support are provided: IEEE_EXCEPTIONS (only overflow and divide-by-zero), IEEE_FEATURES (all IEEE features, which may be imported separately), and IEEE_CONTROL (manage the IEEE features). A lot of support is provided for inquiring whether the hardware/compiler supports various IEEE features. More details are at https://wg5-fortran.org/N1251-N1300/N1281.pdf (in several formats).

Malcolm Cohen, NAG
"Allocatable Components Technical Report and Fortran 2000"
This is also a technical report from ISO, scheduled for inclusion in the next suitable standard. The problem being solved is technical limitations on derived data types in Fortran 90/95; in particular, if you want to have a variable-sized array field you must use a POINTER, which raises performance losses (due to potential aliasing and non-unit strides) and safety problems (memory leaks and dangling pointers). The basic solution is to allow derived type components (i.e. fields of structures) to be ALLOCATABLE. There are some details to getting this to work, but the result is fairly user-friendly. A similar fix allows dummy array arguments to be ALLOCATABLE, which in turn allows a procedure to allocate an array for use in its caller or elsewhere. Applying this to function results allows an array to be returned and automatically deallocated after use. The second talk (like Reid, he had two talks under one title) was about Fortran 2000. Actually, he called it "Fortran 2002" in honor of its expected completion date. The intent is to make F2000 a major upgrade to the language (as opposed to the intentionally minimal extensions in F95). Major features going in include the two technical reports just discussed, improvements to I/O, interval arithmetic, data structuring improvements, and interoperability. There are 11 minor features "finished" (in draft form), 4 under active development, and 4 more "waiting in the wings". The schedule now is to decide final requirements by 2/97 ("that date will not slip any further"), completing separate feature edits by 11/98, an integrated document by 1/00 (including resolving interactions between features), and 4 ballots ending in 11/02. The I/O extensions are for user-defined derived types (providing better control) and asynchronous I/O ("there are a whole lot of interactions with other parts of the standard"). Interoperability is a major feature, including internationalization (still at the requirements stage) and C interoperability (now at the specification stage). Interval arithmetic requires a lot of machinery such as control over optimization, control of rounding, and better opaque types; many users have indicated that they need these to do a full interval package. Data structuring includes parameterized types, pointers to procedures, initializers and finalizers, type extension (inheritance), and type-bound procedures (dynamic dispatch and polymorphism). The last two provide the object-oriented features of F2000, which can basically be summed up as single inheritance, single dispatch, with run- and compile-time efficiency, and without sacrificing necessary functionality. Type-bound procedures extend the type, in much the same way that virtual functions in C++ do. An important aspect of the design is that all types are statically determined. "Are you going to be the first person to implement a Fortran 2000 compiler?" "I couldn't possibly comment." (Malcolm produced reference implementations of the F90 and F95 languages during those standards processes.) More information is available at the J3 website - URL wasn't available, but one can start searching from the WG5 site mentioned above.

Bob Kuhn, Kuck and Associates
"OpenMP Workshop"
I was late to the workshop, having gotten carried away by a hallway conversation. But it turned out that all I missed was the description of features and some of the advertising. I did get there for the description of coming attractions in OpenMP. Various vendors are working on products, and the OpenMP Architecture Review Board is creating a validation suite and working on new features. OpenMP is working on extending to C and C++; a draft document is almost ready for release. There is a web site at http://www.openmp.org/ with more information. Steve talked about the use of OpenMP in NAG. They like OpenMP because it provides portability and they re a third-party software vendor. Orphaned directives are important because they allow modularity. (The idea is that parallelism directives may occur at different levels of the program, and it works the way you want.) Also, they like the ability to optimize for single-processor performance easily. (This is hard with other parallel languages/systems that do a lot of transformations behind the scenes.) More descriptions of OpenMP applications came from Bob. First he described some exercises in parallelizing codes (originally from SpecFP) that were not amenable to automatic parallelization. One optimization was to avoid barriers and cache pollution by inserting a parallel region outside of a parallel DO. The REDUCTION directive handled a continued summation. Dynamic scheduling handled a loop with conditionals (a special case in the first iteration) efficiently. Conditional compilation handled a loop with only a few iterations that would otherwise have been inefficient to parallelize. THREADPRIVATE was needed for common blocks used as temporaries ("passed" between several subroutines in the same thread, but not used to accumulate values between threads). One note was that they recommend setting the default behavior as shared/private in every program, since this is a critical, often tricky, issue. Their view of the steps in parallelizing an (existing) application is
• Analyze
• Restructure
• Test
• Improve
• Quality Assurance
There are tools for each step of the process:
• KAP for analyzing and restructuring. People in some fields (he mentioned CFD) are still using automatic parallelization, so KAP now produces OpenMP directives. In effect, they're either using the automatic system as a first pass or using it to improve/clean up initial parallel directives they put in.
• Guide (from KAI) helps in a loop between restructuring, testing, and improving. It collects runtime data that can be fed back into KAP or used by a special runtime system for tuning.
• Digital's OpenMP compiler has a number of environment variables (e.g. OMP_NUMTHREADS, MP_SPIN_COUNT) to tune the application. It and other OpenMP implementations also offer optional runtime correctness checks, including checks for interference between threads.
• Assure (from KAI) and DBX are parallel debuggers that have OpenMP interfaces. Ladebug (from Digital) is OpenMP-aware also, in the sense that it can stop at subroutine calls associated with (inserted by) OpenMP directives. "As you use the symbolic debuggers you pretty quickly figure out which are the work threads and which are the monitor threads." Assure catches inconsistencies between threads, checks thread stack sizes for overflow, and detecting race conditions. Assure has been used on real programs (OVERFLOW, LS-DYNA) and successfully discovered bugs that the original developers were unaware of.
A long digression popped up about this time about the possibility of private stack overflows. The bottom line is that an appropriately aggressive user can break OpenMP.
• GuideView (from KAI) instruments the program with performance monitors. The case he showed illustrated had serious overhead from over-parallelization (8 threads, 4 processors), visible as increased waiting and sequential time due to spin locks.

Alistair Mills, SGI
"Fortran Fun"
"I don't know what I'm doing here. Once in a moment of weakness I agreed to talk about some of the things you can do with Fortran." This was not a serious talk...

Principia Fortranica (ala Newton):
- First Law: An object at rest will remain at rest unless acted on by an outside force, or an object in motion will remain in motion unless acted on by an outside force
  - Fortran does not change unless a committee acts on it, or Fortran will continue forever if there is no intervention
- Second Law: Force = Mass * Acceleration
  - Fortran = Mathematical * Accuracy
- Third Law: Every action has an equal and opposite reaction
  - Every program for computer type X has an equal program on computer type Y
The Fortran Sonnet Form:
- 14 lines, 3 groups of 4 lines, and one of 2, first group declares the data, second and third analyze the data, last finalizes the data, with a rhyming scheme
- I've got to get this slide...
And a Fortran crossword puzzle, with a prize for the first solution

Panel Discussion
"Advantages and Disadvantages of Fortran 2000"
What communities are driving Fortran standardization now?
Miles Ellis: "There's a large element of sheer inertia." More seriously, mostly guided by high performance people. An example is interval arithmetic; numerical people pushed it and Sun is working hard on it, but it's less obvious that programmers will really make use of it. John Reid brought up the question (again) of how useful the international standards process is. It's possible that interval arithmetic will really be as important as its proponents say it is, but it's less clear that ISO certification of this procedure is required for the feature to succeed.
Is Fortran losing its way in adding more complex features, particularly losing efficiency in adding object-oriented features? Some scientific users need the OO features now. Metcalf noted that CERN supported his involvement in Fortran 8x (and 90 and...) based on the need for object oriented features. The high-energy community has taken an irrevocable move to C++ because it took too long to get objects. In fact, users directly suggested many advanced features; the hard part is to know how serious some of the interest is.
Might there be room for officially-supported subsets of Fortran (e.g. ELF and F)? "Three of us on this table would like to think so, because we'll sell more books." In some sense, though, it doesn't matter if the subset is stamped official or not.
Thinking back to the days of F77, and some vendors were downright hostile to the idea of a new standard... Is there a different attitude today? "Yes" (not quite simultaneous, but no disagreements). Compilers are appearing at reasonable rates, and vendors are working in good faith with the standards committees.
Interestingly, most of the non-vendors on the standards committees are now from the academic world, rather than "real" (i.e. very big) users like CERN. Companies and labs can't (or won't) see the financial advantage there. Portability is not enforced by managers, although it clearly should be given that hardware is likely to change before the software is finished. Why, if the users have so much input to the committees, did it take Fortran 90 so long to be accepted? Inertia and portability (people aren't willing to take the first move to the new standard). Publicity is also a problem; computing services aren't teaching Fortran (or even supplying it).
Physicists don't know that the new features are now out there. "So the marketing is very bad?" "And it's being marketed to the services, not the users."
Is backward compatibility really that important? When they tried to deprecate COMMON/EQUIVALENCE, they were met with howls of protest. Having F77 as a subset is a good thing, but what's really needed are some tools to clean up code and remove obsolescent features. (VAST does this, but isn't stable enough.)
For programs that have been (could be) written in Fortran 90 and C++, is there a productivity difference? Don't know of any systematic studies. What to do about lack of knowledge of F90? "Any organization that is to survive in the long term must spend some resources in keeping their employees up to date." For the questioner's organization, software is not a main-line issue - they're working on physics, and only learn new languages only when they need to. MATLAB is used more than Fortran, because the engineers are more familiar with it. Other engineering schools now teach spreadsheets rather than Fortran. Existing users move only gradually to F90, and modern structured features (e.g. MODULE) tend to be the last they consider.

Richard Field, Edinburgh University
Keynote - "Fortran in Education"
"I will talk some about education, but not so much as Tony Blair." "Our university is 413 years old, which I think means that it is even older than Fortran." Fortran's history is "long and honorable"; the question is how it will go on. Current searches on the world-wide web show 287,580 documents on "Fortran", 21253 on "Fortran 95", 48660 on "HPF" ("I don't know quite what that means"), compared to 6000 on ALGOL, 666000 on Pascal (possibly including discussions on Blaise Pascal), and 4.6 million on UNIX. So there is still some interest in the language, albeit not so much as some other subjects. The state of Fortran teaching in Edinburgh is that it is taught in engineering, but this could change. England emphasizes C and C++; this is seen with alarm in Scotland, where C is considered unsafe. The US "has no programming in its undergraduate curriculum". Parallel Fortran (mainly HPF) is well-represented in education, as several web pages from EPCC showed. Other educational efforts come from other sources, including NAG's installations at educational institutions. Conclusions:

• Fortran lives, but for how long?

• HPF "injection" into Fortran world is good, but watch for paradigm shifts.
• Some shifts are good, e.g. the steam engine to the internal combustion engine.
• Constant re-invention: a strength or a weakness?
Several questions from the audience highlighted some (more) hopeful signs in teaching Fortran, including the many new Fortran 90 texts that are selling well.

Anthony Colyandro, Visual Numerics
"Hybrid Distributed and Shared Memory Parallelization Tools from Visual Numerics"
What he was actually talking about was their implementation of Fortran exception handling in their high-performance distributed network Fortran library, or "What Visual Numerics Is Doing to Make High Performance Computing More Accessible". Their metrics for designing a parallel library are to provide customizable control of master/slave nodes, automatic scalability to computational resources (by runtime monitoring and dynamic scheduling), capability for homogenous and heterogeneous computation (via MPI), and a comprehensive library. They are particularly proud of the work they've done on the "art, science, and mystery" of error handling in their routines, done via a subset implementation of the floating point exception handling TR. (It has to be a subset, since most compilers don't yet handle intrinsic modules.) This is done in IMSL-DNFL (IMSL Distributed Network Fortran Library), an enhanced mathematical/statistical library written in F90 and MPI. Basically, it's the same material described in the IEEE talk the day before, with the addition of a few enumerated types. A few examples of templates for using these exceptions were also given, with the expected advantages in code clarity. Note that Fortran exceptions are not precise; that is, some machines may not set the flags immediately due to pipelining or parallelism. However, one can still test the exception flags later to get a decent idea of what happened (at least better than the old method of dying with a short message). More detailed information on what they've done was available at http://www.vni.com/books/index.html.

Christele Faure, INRIA
"The Odyssee Automatic Differentiation Tool"
Started with a general introduction to automatic differentiation, used to compute derivatives of functions computed from source code. For those who haven't seen automatic differentiation before, the gist is to apply the chain rule repeatedly to (exact) derivatives of the components of the computation. https://www.mcs.anl.gov/research/projects/autodiff/tech_reports.html gives a list of available tools for doing this. Her tool (Odyssee) is a source-to-source translator for Fortran 77, computing tangents and cotangents in forward and reverse mode respectively. An example of applying Odyssee to two codes (Thyc-1D and Thyc-3D, both thermohydrodynamics codes) was given. Thyc-1D was a small (2000-line) required three gradients, while Thyc-3D was a production system (60000 lines) requiring only one gradient. They were using a hybrid forward/reverse mode approach that I didn't quite catch, but it required writing checkpoints periodically, computing reverse-mode derivatives at each checkpoint and running in forward mode between the checkpoints. Choosing the checkpoint frequency was a space-time tradeoff that they were still working on. Numerical results showed some differences, perhaps due to orders of evaluation, between forward mode, reverse mode, and divided differences. More info is available at http://www.ens.utulsa.edu/~diaz/cs8243/odyssee.html.

John Pryce, Cranfield University
"A Fortran 90 Automatic Differentiation Package AD01"
"I have the mandatory introduction to what automatic differentiation is..."
One unique feature of AD01 is that it is not limited to first and second derivatives "if you have world enough and time". The advantage of F90 as an input and target language is that it allows derived types, which AD01 uses to represent the (function, derivative) doublets. Making appropriate changes to the declarations, they can avoid remapping function names and other complex but uninteresting transformations. Similarly, for simple functions the function source does not change at all (although declarations and initializations must be added). They support all the obvious binary and unary functions applied to the AD doublet type through operator overloading. Sparsity is handled automatically. For backward mode, the operators build the operation list/computation graph/tape and the extract operation (invoked at the end) does the computational work. Three performance tests (Rosenbrock function, Elastic-plastic torsion, and CFD kernel) show the effects of optimizations. They improved generated code speed by a factor of 2 to 10 by replacing array assignments by DO loops (avoiding array temporaries) and better sparse array data structures. Most interesting was the CFD test; finite differences ran much faster for the derivatives but the overall system converged slower due to the inaccurate approximations. AD01 did not run particularly fast, particularly compared to the author's hand-coded exact derivatives and derivatives generated by ADIFOR.

Doug Miles, Portland Group Inc.
"HPF or OpenMP"
"The idea here is that there's supposed to be a question mark, and I'm going to try to answer the question." PGI has, of course, been involved with HPF from its inception and now has HPF compilers for Cray, HP, and Intel; more recently they've been moving into the NT and Linux markets, including OpenMP. The presentation started with a short description of each model: HPF's data-parallel model (data distribution and pointwise parallel constructs) and OpenMP's multithreaded model (fork/join threads which must synchronize explicitly). Data parallelism can be portable to serial, shared-memory, and distributed-memory systems; the key decision in writing the program is to focus on the data and distribute it to processors, while the compiler takes care of distributing work to match the data and inserting communication. Multithreading can be applied incrementally to sections of the program by executing iterative constructs in parallel. The OpenMP forum compares the models on several axes (supports data-parallelism, incremental parallelization, performance oriented) and concludes that OpenMP is most convenient (not surprisingly). Doug adds some comparisons (use on clusters, memory hierarchy optimizations, eliminates false sharing) where HPF is more capable than OpenMP, and a few more (parallel I/O) where MPI is the winner over either HPF or OpenMP. To do a head-to-head comparison, PGI uses their compilers:

• OpenMP - native auto-parallelization, directives (subset of OpenMP),
• HPF - shared-memory compilation with put/get rather than message-passing communication
He was unable to present performance comparisons on matrix multiply because "I'm still getting some results I don't understand, and I need to talk to SGI before I present those in public anywhere." Instead, he compared the FALCON reservoir simulator, which had been previously ported to HPF. On a Compaq Pro 8000, the raw times were very close - on 4 processors HPF took 9600 sec and a threaded implementation ("not quite OpenMP") took 9900 sec. His advice was to use OpenMP for legacy applications being ported to small-size shared memory machines, and where dynamic load balancing would be important. HPF was advised for running on clusters, building new scalable applications from scratch, using an existing CM FORTRAN code, or calling highly-optimized MPI libraries through the EXTRINSIC interface. "In conclusion, my advice is to use HPF whenever you can." To answer the question in the title, "HPF or OpenMP? YES."

Cos Ierotheou, University of Greenwich
"CAPTools"
Started by taking a long view of why Fortran is still used (legacy systems and high efficiency) and whether it would survive. The "survival" bullets were more interesting:
• Ideally, should hide as much of the parallelization process as possible, use a parallel compiler, and generate a high-efficiency executable.
• Practically, can't hide all aspects of the parallelization process, control of the execution is essential.
So, today's talk tries to see how much of the parallelization can be done by a tool, either with or without user intervention. CAPTools is a tool to do just this; it takes in Fortran ("as dirty as you like") and produces readable parallel Fortran. They now produce MPI, but plan to produce HPF as well (as sort of an intermediate step, since their method is essentially data distribution driven). The structure of the system is pretty much as one would expect if one were versed in source-to-source restructuring systems: a set of analysis and transformation passes (dependence, data distribution, computation partitioning, etc.) running on a program database with user interaction. The dependence analysis is apparently fairly strong, including interprocedural value numbering and perhaps other advanced techniques. For data partitioning, the program requests initial distributions (HPF distributions or those from an irregular partitioner) for a few main arrays and propagates these to others used with the initial ones. Computation partitioning and communication generation are done automatically, also using interprocedural techniques. Several tests were given, including NAS benchmarks and a significant CFD code run on several machines. Not all the speedups were great, but all were excellent given the very small time that humans had to put into the parallelizations.

John Reid, Rutherford Laboratory
"F--"
The language has changed its name to Co-Array Fortran, but he kept the title consistent with the proceedings. The gist of the language, originally developed by Bob Numrich of Cray/SGI, is an explicit SPMD model, with identical sets of local variables on all processors ("images"). Variables declared as co-arrays are accessible through a new set of array subscripts delimited by []; these essentially address other processors. Processes run asynchronously unless brought into line by an explicit barrier. Some examples of what this leads to are

        t = s[p]              ! broadcast s from p
        x(:)[p] = s[index(:)] ! gather
        ! redistribution
        iz = this_image(a)
        if (iz <= kz) then
            do ix = 1,kx
                a(ix,:) = b(:,iz)[ix]
            end do
        end if

The point of the last several lines is that complex redistributions (in this case, a transpose) are relatively easy to express. The implementation has several rules to ensure that co-arrays (those declared with []) always have the same address within each image; knowing this, addressing is easy and allocation is not a difficulty (given that there is a barrier synchronization at allocation time). Explicit synchronization happens any time that one image relies on another; this implies that it is much more efficient to explicitly do necessary local operations explicitly and combine the results separately. Also, intrinsic functions can only apply to local arrays; therefore, naively written code seems to end up gathering much global data if called with a co-array section. For procedure calls, the ordinary rules of F95 apply and all subscripts must be consistent on all processors; this leads to minimal (no?) new rules on managing generic interfaces. Derived type components cannot have co-array components, but they can have POINTER components; however, the POINTER can only point to data in its own image. This is still enough to allow multi-level parallelism, by declaring arrays of pointers to the local arrays. In summary:
• Co-Array Fortran provides a very clear way to express data access between processes
• It is applicable to both shared and distributed memory
• It is simple to implement, although it does require close collaboration between the levels of the system
• Preliminary results from Cray are very encouraging.

Tony Hey, University of Southampton
"New Challenges for High Performance Computing and for Fortran in the 21st Century"
Tony establish his bona fides by giving a overview of the state of HPC at Southampton (a T3E, SP2, parallel databases, and digital libraries); I doubt that anybody seriously questioned his abilities beforehand, but there was no doubt afterward. But for industry, "parallelism is not the point; cost effectiveness is what they're looking for." Of course, there are a lot of tech transfer programs going on that are catering to this need as well. This led to several observations:
• Challenges for HPC in industry are not the same as "Grand Challenges"
• Industry is keen to exploit existing resources (heterogeneous workstation networks)
• Clusters have been used successfully for some compute-intensive applications
• You have to include metacomputing in high performance computing
He had an example of the Promenvir (probabilistic mechanical design environment) of 6 sites, 50GFLOP/s aggregate power, in several countries (Spain and Italy, at least). The application was an antenna design, where the method was to evaluate alternatives Monte Carlo style on nondedicated machines. It worked, and based on that success led to a new project:
probabilistic crash simulation for cars in a "simulated proving ground". It did, however, raise some interesting issues: security, reliability of the network, and (most importantly) site licensing for the software. A new software economic model may well be needed in the future; this time it wasn't a real issue, because the parent company was there to provide the software. Other examples included effective deployment of parallel codes, data exchange, and computational steering in the TOOLSHED project. The key to this kind of project was embedding the (parallel) code into a design process (with humans). The HiPSID project for simulation and interactive design used fairly off-the-shelf solvers along with interactive feedback to designers. The application there was turbine design, where the bottleneck comes from the fact that there are many more designers (CAD users) than analysts ("the wizards"). Automating the analysis is the way to break the logjam. A new area for supercomputing is data-intensive applications for metacomputers. Persistent object management is needed here, and it's a challenge. "Now here's someplace where HPC can make some money." Data mining of corporate "islands of data" is a big problem for large organizations. The example was Unilever, who did a patent search for one of their new inventions, only to discover prior art - a patent held by Unilever, which had been forgotten/lost in the organization. Of course, high performance platforms (SMPs and MPPs and DSMs) are being used more and more for this knowledge discovery, and optimization of these applications are "every bit as interesting and challenging as optimizing scientific codes." One example was the Italian Financial Flows Archive, where they mined the Bank of Italy's records to discover money laundering. Various visualizations of the data showed interesting trends, such as the geographical plotting of flows out of various regions ("for example, let's consider this little island off the south of Italy..."). Another example was the MEMOIR project, described as akin to Vannevar Bush's "memory expander" (from his article "As We Now Think") that follows links from one authority to another. Three challenges for Fortran are therefore:
• Shrinking Market Share, Multiple Versions: Although total numbers may stay constant or even grow, the overall market will likely be growing faster. Having multiple versions of the language is a real cost for ISVs, who cannot carry too many versions.
• Higher-Level Programming Paradigms: For example, MATLAB and Mathematica are being taught to engineers instead of Fortran. Many other high-level, domain-specific languages are also competing for Fortran's market share.
• Distributed Object Computing and Networks: "Like it or not, that's where we are." Distributed computing is the norm, and despite efforts to add objects to Fortran (e.g. F2000) it will not be the language of choice. "I assume that in time - court actions and so on - the current defects in Java will be fixed."
Finally, a few remarks on computing visions of the future:
• Web browser as the operating system
• Higher-level programming paradigms (but are they de-skilling the process of coding?)
• Re-engineering of old codes (legacy code is becoming increasingly difficult to maintain)
• Finally, beyond 2010, when chips start running into the CMOS endpoint, what will you do?
• He's looking at quantum computing

Charles Koelbel	CRPC, MS 132
Center for Research on Parallel Computation	Rice University
Rice University	6100 Main Street
chk@cs.rice.edu	Houston, TX 77005
phone: 713-285-5304	fax: 713-285-5136

This archive was generated by hypermail 2a23 : Wed 21 Jul 1999 - 12:43:18 BST

Comments on this or any other of the Group's pages should be sent by email to the FSG Web Editor.