ARPACK/PARPACK Checkpointing Capabilities
-----------------------------------------

Hernan G. Arango
Institute of Marine and Coastal Sciences
Rutgers University
August 30, 2005

Checkpointing is a necessary capability for applications that use the
ARPACK library to solve very large (and usually expensive) eigenploblems.
There is not such capability in the released ARPACK (Version 2.4) library.
Although, in the contribution area there are a couple of routines modified
to do such task. The strategy was to add extra arguments to reverse
communications driver DSAUPD.  The restart is then triggered by setting the
IDO flag to -2.

This is not an optimal way to restart the ARPACK library because it is
not supported by all the drivers and does not ensure the same results.
I have not been able to get my applications to produce the exact results
in the same number of iterations when compared with an interrupted
computation.  The reason for this is that ARPACK has a lot of internal
parameters that are saved during the computations with the Fortran 
"save" command. This suggests that when an application is restarted from
a checkpointing file we are not solving exactly the same subspace problem.

An alternative checkpointing strategy is proposed that maintain the
integrity of the current ARPACK library.  There are not extra arguments
to IRAM. The uninterrupted capabilities of ARPACK is intact and the user
gets identical results. The new strategy is as follows:

1)  Move all variables declared with the "save" statement to common
    blocks in several include files:

    i_aupd.h   private variables used by _AUPD routines
    idaup2.h   private variables used by _AUP2 double precision routines
    isaup2.h   private variables used by _AUP2 single precision routines
    idaitr.h   private variables used by _AITR double precision routines
    isaitr.h   private variables used by _AITR single precision routines

    The declaration of these variables were removed from the ARPACK
    version 2.4 symmetric and non-symmetric routines and put in the above
    include files.  I am only interested in the symmetric and non-symmetric
    routines but similar strategy can be used in the other routines.

    A single include file is not possible because the same variable names
    are repeated with different meanings and passed as arguments to other
    routines. The renaming and changing argument names is not desired 
    because require a lot of testing and can break the current capability.

    For example, idaup2.h becomes

c
c     %---------------------------------------------------%
c     | Private variables used by  _AITR single precision |
c     | routines are saved in common blocks to facilitate |
c     | checkpointing.  All  these  variables  need to be |
c     | saved and recovered during checkpointing restart. |
c     %---------------------------------------------------%
c
      logical
     &           orth1, orth2, rstart, step3, step4
      common /lsaitr/
     &           orth1, orth2, rstart, step3, step4

      integer
     &           ierr, ipj, irj, ivj, iter, itry, j, msglvl
      common /isaitr/
     &           ierr, ipj, irj, ivj, iter, itry, j, msglvl

      double precision
     &           betaj, ovfl, rnorm1, safmin, smlnum, ulp, unfl, wnorm
      common /rsaitr/
     &           betaj, ovfl, rnorm1, safmin, smlnum, ulp, unfl, wnorm


    Notice that logical, integer and floating point variables are in
    separated in different common blocks.  This is very important in
    various computer architectures.

2) As in previous checkpointing attempts, the IDO flag with a value of
   -2 is used to trigger restart from a checkpointing file.

3) The only executable change done to the ARPACK routines it to add
   an extra condition to the initialization of several routines from

      if (ido .eq. 0) then

   to

      if ((ido .eq. 0) .or. (ido .eq. -2)) then


   Also, in the same IF block an extra conditional is added for the
   initialization of several variables during cold start and NOT
   restart. If restart, all these value are recovered from the
   checkpointing file.

4) The user now have access to all the internal parameters of ARPACK
   and fine tune its values to a wide variety of restart applications.

   Since there are repeated variable names in the above common blocks,
   the user need to address the common block in a compact way.  For
   example, in my ARPACK application module I added the following
   statements:

      integer  :: iaitr(8), iaup2(8), iaupd(20)
      logical  :: laitr(5), laup2(5)
      real(r8) :: raitr(8), raup2(2)
!
      common /i_aupd/ iaupd
#ifdef DOUBLE_PRECISION
      common /idaitr/ iaitr
      common /ldaitr/ laitr
      common /rdaitr/ raitr
      common /idaup2/ iaup2
      common /ldaup2/ laup2
      common /rdaup2/ raup2
#else
      common /isaitr/ iaitr
      common /lsaitr/ laitr
      common /rsaitr/ raitr
      common /isaup2/ iaup2
      common /lsaup2/ laup2
      common /rsaup2/ raup2
#endif

    This compact form facilitates the writing of the checkpointing file. 
    My checkpointing file is written in a NetCDF file for both serial and
    parallel applications.  The user also need to save all the scalars
    and arrays passed as argument to any of the _AUPD routines.

5)  The following routines were modified for checkpointing:

    symmetric:

    ARPACK/SRC/dsaupd.f            ARPACK/PARPACK/SRC/MPI/pdsaupd.f
    ARPACK/SRC/ssaupd.f            ARPACK/PARPACK/SRC/MPI/pssaupd.f

    ARPACK/SRC/dsaup2.f            ARPACK/PARPACK/SRC/MPI/pdsaup2.f
    ARPACK/SRC/ssaup2.f            ARPACK/PARPACK/SRC/MPI/pssaup2.f

    ARPACK/SRC/dsaitr.f            ARPACK/PARPACK/SRC/MPI/pdsaitr.f
    ARPACK/SRC/ssaitr.f            ARPACK/PARPACK/SRC/MPI/pssaitr.f

    non-symmetric:

    ARPACK/SRC/dnaupd.f            ARPACK/PARPACK/SRC/MPI/pdnaupd.f
    ARPACK/SRC/snaupd.f            ARPACK/PARPACK/SRC/MPI/psnaupd.f

    ARPACK/SRC/dnaup2.f            ARPACK/PARPACK/SRC/MPI/pdnaup2.f
    ARPACK/SRC/snaup2.f            ARPACK/PARPACK/SRC/MPI/psnaup2.f

    ARPACK/SRC/dnaitr.f            ARPACK/PARPACK/SRC/MPI/pdnaitr.f
    ARPACK/SRC/snaitr.f            ARPACK/PARPACK/SRC/MPI/psnaitr.f

    The modified files are included in the following tar file:

        restart.tar.gz