Unnecessary Copying Of Pointer Arrays

John Reid, JKR Associates

Most compilers hold Fortran 77 arrays (explicit-shaped and assumed-size) in contiguous memory. They also hold allocatable arrays in contiguous memory. Some array sections, such as a(1:10:2), cannot be in contiguous memory. Assumed-shape dummy arrays are able to accommodate being passed such an array section (there is usually a descriptor that holds the extent and stride for each dimension), but copying is needed if such a section is passed to an explicit-shaped or assumed-size dummy array.

Copying may be needed for a pointer actual argument, too, since a pointer array may have a section as its target.

It may be possible to tell at compile time whether an array section or pointer is contiguous, but in most cases a run-time check is needed.

A colleague, Jennifer Scott, hit the problem in 1994. Her code uses reverse communication to accept the rows of a matrix one at a time. She placed all the data for each problem in a structure. The matrix itself varies in size so has to be in a pointer array. This array is always contiguous, but the compilers made a copy on every entry and the speed of the code was drastically reduced. Ever since, I have been trying to persuade vendors to avoid such unnecessary copying.

As a simple test, I have used these two subroutines

   subroutine assumed_shape(a,b,i,j)
      real, dimension(:) :: a,b
      integer i,j
      a(i) = b(j)
   end subroutine assumed_shape

   subroutine assumed_size(a,b,i,j)
      real, dimension(*) :: a,b
      integer i,j
      a(i) = b(j)
   end subroutine assumed_size

and passed them contiguous actual arrays of size 10**6. My latest code is appended.

My hope is that the time would be small in every case. This is the case now on SUNs for the SUN, Nag and Fujitsu compilers. Other compilers to which I have access are the Compaq compiler on an Alpha, the IBM compiler on an R/S 6000, and the Salford and NASoftware compilers on a PC. None of these do well, as the following table shows.

 Compaq
Alpha
IBM
R/S 6000
Salford
PC
NASoftware
PC
Pointer - assumed-shape0.00 0.00.30.0
Pointer section - assumed-shape0.00 0.00.30.0
Pointer - assumed-size0.00 0.00.30.2
Pointer section - assumed-size0.05 1.00.30.2
Allocatable section - assumed-size0.03 1.00.20.0
Explicit section - assumed-size0.03 0.90.00.0
Pointer component - assumed-size0.05 0.00.00.2


Please note that these machines have very different capabilities (the Alpha is new and fast, the IBM is very old). What is important is whether the time is nonzero to the precision that I have displayed.

For the moment, Jennifer has avoided the penalty on the Compaq compiler by using a local pointer to point to her pointer component.


   module procs
   implicit none
      type t
         real, pointer :: c(:)
      end type t
   contains
      subroutine assumed_shape(a,b,i,j)
         real, dimension(:) :: a,b
         integer i,j
         a(i) = b(j)
      end subroutine assumed_shape
   end module procs
   
      subroutine assumed_size(a,b,i,j)
         real, dimension(*) :: a,b
         integer i,j
         a(i) = b(j)
      end subroutine assumed_size
   
   program main
      use procs
      type (t) c,d
      integer,parameter :: n=1000000
      real, dimension(:), pointer :: a,b
      real, dimension(:), allocatable :: e,f
      real, dimension(n) :: g,h
      integer i,count1,count2,rate
      allocate (a(n),b(n),c%c(n),d%c(n))
      allocate (e(n),f(n))
      a = 1
      b = 1
      c%c = 1
      d%c = 1
      do i = 1,2
         write(*,*)
   
         call system_clock(count1,rate)
         call assumed_shape(a,b,i,i+1)
         call system_clock(count2,rate)
         write(*,'(a,f9.6)')'Time taken (pointer/assumed-shape)              ',&
                  (count2-count1)/real(rate)
   
         call system_clock(count1,rate)
         call assumed_shape(a(i:),b(i:),i,i+1)
         call system_clock(count2,rate)
         write(*,'(a,f9.6)')'Time taken (pointer-section/assumed-shape)      ',&
                  (count2-count1)/real(rate)
   
         call system_clock(count1,rate)
         call assumed_size(a,b,i,i+1)
         call system_clock(count2,rate)
         write(*,'(a,f9.6)')'Time taken (pointer/assumed-size)               ',&
                  (count2-count1)/real(rate)
   
         call system_clock(count1,rate)
         call assumed_size(a(i:),b(i:),i,i+1)
         call system_clock(count2,rate)
         write(*,'(a,f9.6)')'Time taken (pointer-section/assumed-size)       ',&
                  (count2-count1)/real(rate)
   
         call system_clock(count1,rate)
         call assumed_size(e(i:),f(i:),i,i+1)
         call system_clock(count2,rate)
         write(*,'(a,f9.6)')'Time taken (allocatable-section/assumed-size)   ',&
                  (count2-count1)/real(rate)
   
         call system_clock(count1,rate)
         call assumed_size(g(i:),h(i:),i,i+1)
         call system_clock(count2,rate)
         write(*,'(a,f9.6)')'Time taken (explicit-section/assumed-size)      ',&
                  (count2-count1)/real(rate)
   
         call system_clock(count1,rate)
         call assumed_size(c%c,d%c,i+1,i)
         call system_clock(count2,rate)
         write(*,'(a,f9.6)')'Time taken (pointer-component/assumed-size)     ',&
                  (count2-count1)/real(rate)
      end do
   
   end program main
   

Valid XHTML 1.0! Comments on this or any other of the Group's pages should be sent by email to the FSG Web Editor.