ASM AFD does not mount disk group after server reboot.

Hoje, ao reiniciar um servidor recém-instalado com Oracle e AFD, identifiquei que os diskgroups não estavam sendo montados corretamente. Abaixo, descrevo as informações sobre as verificações realizadas e os procedimentos adotados para a correção temporária.

Versão ASM:

[grid@srv-ora01 ~]$ asmcmd -V
asmcmd version 19.26.0.0.0

Versão do kernel:

[root@srv-ora01 ~]# uname -a
Linux srv-ora01 5.15.0-306.177.4.1.el8uek.x86_64 #2 SMP Sat Mar 22 03:27:18 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux

Multipatch configurado:

[root@srv-ora01 ~]# multipath -ll
data01 (360050763808107366800000000000003) dm-4 IBM,2145
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=36 status=active
|- 8:0:0:0 sdc 8:32 active ready running
|- 9:0:0:0 sdg 8:96 active ready running
`- 10:0:0:0 sdo 8:224 active ready running
data02 (360050763808107366800000000000004) dm-5 IBM,2145
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=23 status=active
|- 8:0:0:1 sdd 8:48 active ready running
|- 9:0:0:1 sdi 8:128 active ready running
`- 10:0:0:1 sdp 8:240 active ready running
data03 (360050763808107366800000000000007) dm-7 IBM,2145
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=36 status=active
|- 8:0:0:2 sde 8:64 active ready running
|- 9:0:0:2 sdk 8:160 active ready running
`- 10:0:0:2 sdq 65:0 active ready running
data04 (360050763808107366800000000000008) dm-6 IBM,2145
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua'

Confirmei que os serviços do AFD,ISCSI e multipathd estavão ativos:

[root@srv-ora01 ~]# systemctl list-units --type=service
  UNIT                                           LOAD   ACTIVE SUB     DESCRIPTION
  afd.service                                    loaded active exited  LSB: Start and Stop ASM Filter driver #<============= AFD
  iscsi-shutdown.service                         loaded active exited  Logout off all iSCSI sessions on shutdown #<========= ISCSI
  iscsi.service                                  loaded active exited  Login and scanning of iSCSI devices #<========= ISCSI
  iscsid.service                                 loaded active running Open-iSCSI  #<========= ISCSI
  kdump.service                                  loaded active exited  Crash recovery kernel arming
  kmod-static-nodes.service                      loaded active exited  Create list of required static device nodes for the current kernel
  lvm2-monitor.service                           loaded active exited  Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
  lvm2-pvscan@8:2.service                        loaded active exited  LVM event activation on device 8:2
  multipathd.service                             loaded active running Device-Mapper Multipath Device Controller  #<========= Multipath
  NetworkManager-dispatcher.service              loaded active running Network Manager Script Dispatcher Service
 NetworkManager-wait-online.service             loaded failed failed  Network Manager Wait Online
  NetworkManager.service                         loaded active running Network Manager
  nis-domainname.service                         loaded active exited  Read and set NIS domainname from /etc/sysconfig/network
  ohasd.service                                  loaded active exited  LSB: Start and Stop Oracle High Availability Service
  oracle-ohasd.service                           loaded active running Oracle High Availability Services

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

Verfiquei o status do multipathd:

[root@srv-ora01 ~]# systemctl status multipathd
 multipathd.service - Device-Mapper Multipath Device Controller
   Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2025-05-20 09:47:22 -03; 1h 54min ago
 Main PID: 1086 (multipathd)
   Status: "up"
    Tasks: 7
   Memory: 15.9M
   CGroup: /system.slice/multipathd.service
           └─1086 /sbin/multipathd -d -s

May 20 09:48:28 srv-ora01 multipathd[1086]: data02: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: data02: load table [0 4294967296 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:48 1 8:128 1 8:240 1]
May 20 09:48:28 srv-ora01 multipathd[1086]: data04: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: data04: load table [0 4294967296 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:176 1 8:80 1 65:16 1]
May 20 09:48:28 srv-ora01 multipathd[1086]: data03: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: data03: load table [0 4294967296 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:64 1 8:160 1 65:0 1]
May 20 09:48:28 srv-ora01 multipathd[1086]: fra01: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: fra01: load table [0 314572800 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:112 1 8:192 1 65:32 1]
May 20 09:48:28 srv-ora01 multipathd[1086]: fra02: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: fra02: load table [0 314572800 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:144 1 8:208 1 65:48 1]
[root@srv-ora01 ~]#
[root@srv-ora01 ~]#
[root@srv-ora01 ~]# systemctl status iscsi
 iscsi.service - Login and scanning of iSCSI devices
   Loaded: loaded (/usr/lib/systemd/system/iscsi.service; enabled; vendor preset: disabled)
   Active: active (exited) since Tue 2025-05-20 09:50:26 -03; 1h 52min ago
     Docs: man:iscsiadm(8)
           man:iscsid(8)
  Process: 2571 ExecStart=/usr/sbin/iscsiadm -m node --loginall=automatic (code=exited, status=8)
 Main PID: 2571 (code=exited, status=8)
    Tasks: 0 (limit: 819984)
   Memory: 0B
   CGroup: /system.slice/iscsi.service

Listei e confirmei as dependencias do afd.service com o multipathd:

[root@srv-ora01 /]# systemctl list-dependencies afd.service
afd.service
 ├─system.slice
 └─sysinit.target
   ├─dev-hugepages.mount
   ├─dev-mqueue.mount
   ├─dracut-shutdown.service
   ├─import-state.service
   ├─iscsi-onboot.service #<========== iscsi
   ├─kmod-static-nodes.service
   ├─ldconfig.service
   ├─loadmodules.service
   ├─lvm2-lvmpolld.socket
   ├─lvm2-monitor.service
   ├─multipathd.service  #<========== multipathd
   ├─nis-domainname.service
   ├─plymouth-read-write.service
   ├─plymouth-start.service
   ├─proc-sys-fs-binfmt_misc.automount
   ├─selinux-autorelabel-mark.service
   ├─sys-fs-fuse-connections.mount
   ├─sys-kernel-config.mount
   ├─sys-kernel-debug.mount
   ├─systemd-ask-password-console.path
   ├─systemd-binfmt.service
   ├─systemd-firstboot.service
   ├─systemd-hwdb-update.service
   ├─systemd-journal-catalog-update.service
   ├─systemd-journal-flush.service
   ├─systemd-journald.service
   ├─systemd-machine-id-commit.service
   ├─systemd-modules-load.service
   ├─systemd-pstore.service
   ├─systemd-random-seed.service
   ├─systemd-sysctl.service
   ├─systemd-sysusers.service
   ├─systemd-tmpfiles-setup-dev.service
   ├─systemd-tmpfiles-setup.service
   ├─systemd-udev-trigger.service
   ├─systemd-udevd.service
   ├─systemd-update-done.service
   ├─systemd-update-utmp.service
   ├─cryptsetup.target
   ├─local-fs.target
    ├─-.mount
    ├─boot.mount
    ├─systemd-fstab-generator-reload-targets.service
    ├─systemd-remount-fs.service
    └─u01.mount
   └─swap.target
     └─dev-mapper-ol\x2dswap.swap

Verfiquei o status do serviço do AFD após o restart, nesse momento encontrei o problema: /bin/chown: cannot access ‘/dev/oracleafd/disks/*’: No such file or directory.

[root@srv-ora01 ~]# systemctl status afd.service
 afd.service - LSB: Start and Stop ASM Filter driver
   Loaded: loaded (/etc/rc.d/init.d/afd; generated)
   Active: active (exited) since Tue 2025-05-20 21:15:56 -03; 10h ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 0 (limit: 819984)
   Memory: 0B
   CGroup: /system.slice/afd.service

May 20 21:15:55 srv-ora01 afd[2382]: AFD-643: Validating AFD installation files for operating system.
May 20 21:15:55 srv-ora01 afd[2393]: AFD-9393: Verifying ASM administrator setup.
May 20 21:15:55 srv-ora01 afd[2404]: AFD-637: Loading installed AFD drivers.
May 20 21:15:55 srv-ora01 afd[2412]: AFD-9154: Loading 'oracleafd.ko' driver.
May 20 21:15:55 srv-ora01 afd[2481]: AFD-649: Verifying AFD devices.
May 20 21:15:55 srv-ora01 afd[2489]: AFD-9156: Detecting control device '/dev/oracleafd/admin'.
May 20 21:15:56 srv-ora01 afd[2519]: AFD-9294: updating file /etc/sysconfig/oracledrivers.conf
May 20 21:15:56 srv-ora01 afd[2541]: AFD-9322: completed
May 20 21:15:56 srv-ora01 afd[2564]: /bin/chown: cannot access '/dev/oracleafd/disks/*': No such file or directory #AFD error that only used disks could not be found.
May 20 21:15:56 srv-ora01 systemd[1]: Started LSB: Start and Stop ASM Filter driver.

O problema ocorreu devido o CRS ter iniciado antes que o multipath disponibilizasse os discos para uso no ASM. Esta situação já é conhecida e mapeada há bastante tempo, especialmente em ambientes que utilizam o ASMLIB. Inclusive, já publiquei anteriormente informações sobre esse problema com ASMLIB: https://www.cesardba.com.br/asm-nao-monta-o-disk-group-apos-o-reboot-do-servidor/

Após identificar o problema, abri uma SR na Oracle para formalizar a ocorrência, em paralelo, analisei o script de inicialização do AFD e verifiquei que ele possui uma função chamada afd_scandisk(), responsável por realizar o scan dos discos. Decidi inserir um sleep de 60 segundos dentro dessa função, com o objetivo de garantir tempo suficiente para que o multipath concluísse e disponibilizasse os discos. Com isso em teoria, o ASM/AFD iria reconhecer adequadamente os discos e proceder com a montagem dos disk groups.

[root@srv-ora01 ~]# systemctl status afd.service
 afd.service - LSB: Start and Stop ASM Filter driver
   Loaded: loaded (/etc/rc.d/init.d/afd; generated) #<========================= File used to initialize the AFD server.
   Active: active (exited) since Tue 2025-05-20 21:15:56 -03; 10h ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 0 (limit: 819984)
   Memory: 0B
   CGroup: /system.slice/afd.service
   

#Adjustment made to the afd_scandisk() function, sleep 60.
[root@srv-ora01 ~]# cat /etc/rc.d/init.d/afd 
afd_scandisk()
{
  sleep 60 #<=========================== ADJUSTED SLEEP 60 =====================
  if [ ! -r $AFDBOOT ]
  then
    CMD="/sbin/afdboot -scandisk"
  else
    CMD="$AFDBOOT -scandisk"
  fi
  $LOGINFO "Discovering AFD disks ($CMD)"

 # Run the command as root to see all devices
 $CMD
 if [ "$?" != "0" ]
 then
   $LOGMSG "Failed to scan AFD devices"
 else
   $LOGINFO "AFD scandisk done."
 fi
}

#Reload daemon and restart rv-ora01.
[root@srv-ora01 ~]# systemctl daemon-reload
[root@srv-ora01 ~]# init 6

[root@srv-ora01 ~]# systemctl status afd.service
 afd.service - LSB: Start and Stop ASM Filter driver
   Loaded: loaded (/etc/rc.d/init.d/afd; generated)
   Active: active (exited) since Wed 2025-05-21 07:39:35 -03; 10s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 2341 ExecStart=/etc/rc.d/init.d/afd start (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 819984)
   Memory: 0B
   CGroup: /system.slice/afd.service

May 21 07:38:34 srv-ora01 afd[2373]: AFD-641: Checking for existing AFD installation.
May 21 07:38:34 srv-ora01 afd[2389]: AFD-643: Validating AFD installation files for operating system.
May 21 07:38:34 srv-ora01 afd[2399]: AFD-9393: Verifying ASM administrator setup.
May 21 07:38:34 srv-ora01 afd[2411]: AFD-637: Loading installed AFD drivers.
May 21 07:38:34 srv-ora01 afd[2419]: AFD-9154: Loading 'oracleafd.ko' driver.
May 21 07:38:35 srv-ora01 afd[2488]: AFD-649: Verifying AFD devices. #AFD disks available for use after fix and restart.
May 21 07:38:35 srv-ora01 afd[2496]: AFD-9156: Detecting control device '/dev/oracleafd/admin'.
May 21 07:38:35 srv-ora01 afd[2526]: AFD-9294: updating file /etc/sysconfig/oracledrivers.conf
May 21 07:38:35 srv-ora01 afd[2546]: AFD-9322: completed
May 21 07:39:35 srv-ora01 systemd[1]: Started LSB: Start and Stop ASM Filter driver.

#AFD disks available for use after fix and restart.
[root@srv-ora01 ~]# ls -lart /dev/oracleafd/disks/*
-rw-rw-r-- 1 grid asmadmin 21 May 21 07:39 /dev/oracleafd/disks/IMAGEM01
-rw-rw-r-- 1 grid asmadmin 18 May 21 07:39 /dev/oracleafd/disks/FRA02
-rw-rw-r-- 1 grid asmadmin 18 May 21 07:39 /dev/oracleafd/disks/FRA01
-rw-rw-r-- 1 grid asmadmin 19 May 21 07:39 /dev/oracleafd/disks/DATA04
-rw-rw-r-- 1 grid asmadmin 19 May 21 07:39 /dev/oracleafd/disks/DATA03
-rw-rw-r-- 1 grid asmadmin 19 May 21 07:39 /dev/oracleafd/disks/DATA02
-rw-rw-r-- 1 grid asmadmin 19 May 21 07:39 /dev/oracleafd/disks/DATA01

Após realizar o ajuste na função, reiniciei o servidor e confirmei que o problema foi solucionado. Na SR aberta junto à Oracle, questionei se essa solução seria suportada oficialmente pela Oracle. O engenheiro responsável confirmou que a alteração é suportada pela Oracle e que eu poderia prosseguir com a mudança.

Do not change without approval from Oracle Support.

search previous next tag category expand menu location phone mail time cart zoom edit close