- 2001 AGM
- About us
Fortran Specialist Group
John Reid, JKR Associates
Most compilers hold Fortran 77 arrays (explicit-shaped and assumed-size) in contiguous memory. They also hold allocatable arrays in contiguous memory. Some array sections, such as a(1:10:2), cannot be in contiguous memory. Assumed-shape dummy arrays are able to accommodate being passed such an array section (there is usually a descriptor that holds the extent and stride for each dimension), but copying is needed if such a section is passed to an explicit-shaped or assumed-size dummy array.
Copying may be needed for a pointer actual argument, too, since a pointer array may have a section as its target.
It may be possible to tell at compile time whether an array section or pointer is contiguous, but in most cases a run-time check is needed.
A colleague, Jennifer Scott, hit the problem in 1994. Her code uses reverse communication to accept the rows of a matrix one at a time. She placed all the data for each problem in a structure. The matrix itself varies in size so has to be in a pointer array. This array is always contiguous, but the compilers made a copy on every entry and the speed of the code was drastically reduced. Ever since, I have been trying to persuade vendors to avoid such unnecessary copying.
As a simple test, I have used these two subroutines
subroutine assumed_shape(a,b,i,j) real, dimension(:) :: a,b integer i,j a(i) = b(j) end subroutine assumed_shape subroutine assumed_size(a,b,i,j) real, dimension(*) :: a,b integer i,j a(i) = b(j) end subroutine assumed_size
and passed them contiguous actual arrays of size 10**6. My latest code is appended.
My hope is that the time would be small in every case. This is the case now on SUNs for the SUN, Nag and Fujitsu compilers. Other compilers to which I have access are the Compaq compiler on an Alpha, the IBM compiler on an R/S 6000, and the Salford and NASoftware compilers on a PC. None of these do well, as the following table shows.
Compaq Alpha | IBM R/S 6000 |
Salford PC | NASoftware PC |
|
---|---|---|---|---|
Pointer - assumed-shape | 0.00 | 0.0 | 0.3 | 0.0 |
Pointer section - assumed-shape | 0.00 | 0.0 | 0.3 | 0.0 |
Pointer - assumed-size | 0.00 | 0.0 | 0.3 | 0.2 |
Pointer section - assumed-size | 0.05 | 1.0 | 0.3 | 0.2 |
Allocatable section - assumed-size | 0.03 | 1.0 | 0.2 | 0.0 |
Explicit section - assumed-size | 0.03 | 0.9 | 0.0 | 0.0 |
Pointer component - assumed-size | 0.05 | 0.0 | 0.0 | 0.2 |
Please note that these machines have very different capabilities
(the Alpha is new and fast, the IBM is very old). What is important
is whether the time is nonzero to the precision that I have displayed.
For the moment, Jennifer has avoided the penalty on the Compaq compiler by using a local pointer to point to her pointer component.
module procs implicit none type t real, pointer :: c(:) end type t contains subroutine assumed_shape(a,b,i,j) real, dimension(:) :: a,b integer i,j a(i) = b(j) end subroutine assumed_shape end module procs subroutine assumed_size(a,b,i,j) real, dimension(*) :: a,b integer i,j a(i) = b(j) end subroutine assumed_size program main use procs type (t) c,d integer,parameter :: n=1000000 real, dimension(:), pointer :: a,b real, dimension(:), allocatable :: e,f real, dimension(n) :: g,h integer i,count1,count2,rate allocate (a(n),b(n),c%c(n),d%c(n)) allocate (e(n),f(n)) a = 1 b = 1 c%c = 1 d%c = 1 do i = 1,2 write(*,*) call system_clock(count1,rate) call assumed_shape(a,b,i,i+1) call system_clock(count2,rate) write(*,'(a,f9.6)')'Time taken (pointer/assumed-shape) ',& (count2-count1)/real(rate) call system_clock(count1,rate) call assumed_shape(a(i:),b(i:),i,i+1) call system_clock(count2,rate) write(*,'(a,f9.6)')'Time taken (pointer-section/assumed-shape) ',& (count2-count1)/real(rate) call system_clock(count1,rate) call assumed_size(a,b,i,i+1) call system_clock(count2,rate) write(*,'(a,f9.6)')'Time taken (pointer/assumed-size) ',& (count2-count1)/real(rate) call system_clock(count1,rate) call assumed_size(a(i:),b(i:),i,i+1) call system_clock(count2,rate) write(*,'(a,f9.6)')'Time taken (pointer-section/assumed-size) ',& (count2-count1)/real(rate) call system_clock(count1,rate) call assumed_size(e(i:),f(i:),i,i+1) call system_clock(count2,rate) write(*,'(a,f9.6)')'Time taken (allocatable-section/assumed-size) ',& (count2-count1)/real(rate) call system_clock(count1,rate) call assumed_size(g(i:),h(i:),i,i+1) call system_clock(count2,rate) write(*,'(a,f9.6)')'Time taken (explicit-section/assumed-size) ',& (count2-count1)/real(rate) call system_clock(count1,rate) call assumed_size(c%c,d%c,i+1,i) call system_clock(count2,rate) write(*,'(a,f9.6)')'Time taken (pointer-component/assumed-size) ',& (count2-count1)/real(rate) end do end program main
Comments on this or any other of the Group's pages should be sent by email to the FSG Web Editor.