Hoje, ao reiniciar um servidor recém-instalado com Oracle e AFD, identifiquei que os diskgroups não estavam sendo montados corretamente. Abaixo, descrevo as informações sobre as verificações realizadas e os procedimentos adotados para a correção temporária.
Versão ASM:
[grid@srv-ora01 ~]$ asmcmd -V
asmcmd version 19.26.0.0.0
Versão do kernel:
[root@srv-ora01 ~]# uname -a
Linux srv-ora01 5.15.0-306.177.4.1.el8uek.x86_64 #2 SMP Sat Mar 22 03:27:18 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux
Multipatch configurado:
[root@srv-ora01 ~]# multipath -ll
data01 (360050763808107366800000000000003) dm-4 IBM,2145
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=36 status=active
|- 8:0:0:0 sdc 8:32 active ready running
|- 9:0:0:0 sdg 8:96 active ready running
`- 10:0:0:0 sdo 8:224 active ready running
data02 (360050763808107366800000000000004) dm-5 IBM,2145
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=23 status=active
|- 8:0:0:1 sdd 8:48 active ready running
|- 9:0:0:1 sdi 8:128 active ready running
`- 10:0:0:1 sdp 8:240 active ready running
data03 (360050763808107366800000000000007) dm-7 IBM,2145
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=36 status=active
|- 8:0:0:2 sde 8:64 active ready running
|- 9:0:0:2 sdk 8:160 active ready running
`- 10:0:0:2 sdq 65:0 active ready running
data04 (360050763808107366800000000000008) dm-6 IBM,2145
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua'
Confirmei que os serviços do AFD,ISCSI e multipathd estavão ativos:
[root@srv-ora01 ~]# systemctl list-units --type=service
UNIT LOAD ACTIVE SUB DESCRIPTION
afd.service loaded active exited LSB: Start and Stop ASM Filter driver #<============= AFD
iscsi-shutdown.service loaded active exited Logout off all iSCSI sessions on shutdown #<========= ISCSI
iscsi.service loaded active exited Login and scanning of iSCSI devices #<========= ISCSI
iscsid.service loaded active running Open-iSCSI #<========= ISCSI
kdump.service loaded active exited Crash recovery kernel arming
kmod-static-nodes.service loaded active exited Create list of required static device nodes for the current kernel
lvm2-monitor.service loaded active exited Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
lvm2-pvscan@8:2.service loaded active exited LVM event activation on device 8:2
multipathd.service loaded active running Device-Mapper Multipath Device Controller #<========= Multipath
NetworkManager-dispatcher.service loaded active running Network Manager Script Dispatcher Service
● NetworkManager-wait-online.service loaded failed failed Network Manager Wait Online
NetworkManager.service loaded active running Network Manager
nis-domainname.service loaded active exited Read and set NIS domainname from /etc/sysconfig/network
ohasd.service loaded active exited LSB: Start and Stop Oracle High Availability Service
oracle-ohasd.service loaded active running Oracle High Availability Services
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
Verfiquei o status do multipathd:
[root@srv-ora01 ~]# systemctl status multipathd
● multipathd.service - Device-Mapper Multipath Device Controller
Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2025-05-20 09:47:22 -03; 1h 54min ago
Main PID: 1086 (multipathd)
Status: "up"
Tasks: 7
Memory: 15.9M
CGroup: /system.slice/multipathd.service
└─1086 /sbin/multipathd -d -s
May 20 09:48:28 srv-ora01 multipathd[1086]: data02: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: data02: load table [0 4294967296 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:48 1 8:128 1 8:240 1]
May 20 09:48:28 srv-ora01 multipathd[1086]: data04: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: data04: load table [0 4294967296 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:176 1 8:80 1 65:16 1]
May 20 09:48:28 srv-ora01 multipathd[1086]: data03: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: data03: load table [0 4294967296 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:64 1 8:160 1 65:0 1]
May 20 09:48:28 srv-ora01 multipathd[1086]: fra01: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: fra01: load table [0 314572800 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:112 1 8:192 1 65:32 1]
May 20 09:48:28 srv-ora01 multipathd[1086]: fra02: performing delayed actions
May 20 09:48:28 srv-ora01 multipathd[1086]: fra02: load table [0 314572800 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 3 1 8:144 1 8:208 1 65:48 1]
[root@srv-ora01 ~]#
[root@srv-ora01 ~]#
[root@srv-ora01 ~]# systemctl status iscsi
● iscsi.service - Login and scanning of iSCSI devices
Loaded: loaded (/usr/lib/systemd/system/iscsi.service; enabled; vendor preset: disabled)
Active: active (exited) since Tue 2025-05-20 09:50:26 -03; 1h 52min ago
Docs: man:iscsiadm(8)
man:iscsid(8)
Process: 2571 ExecStart=/usr/sbin/iscsiadm -m node --loginall=automatic (code=exited, status=8)
Main PID: 2571 (code=exited, status=8)
Tasks: 0 (limit: 819984)
Memory: 0B
CGroup: /system.slice/iscsi.service
Listei e confirmei as dependencias do afd.service com o multipathd:
[root@srv-ora01 /]# systemctl list-dependencies afd.service
afd.service
● ├─system.slice
● └─sysinit.target
● ├─dev-hugepages.mount
● ├─dev-mqueue.mount
● ├─dracut-shutdown.service
● ├─import-state.service
● ├─iscsi-onboot.service #<========== iscsi
● ├─kmod-static-nodes.service
● ├─ldconfig.service
● ├─loadmodules.service
● ├─lvm2-lvmpolld.socket
● ├─lvm2-monitor.service
● ├─multipathd.service #<========== multipathd
● ├─nis-domainname.service
● ├─plymouth-read-write.service
● ├─plymouth-start.service
● ├─proc-sys-fs-binfmt_misc.automount
● ├─selinux-autorelabel-mark.service
● ├─sys-fs-fuse-connections.mount
● ├─sys-kernel-config.mount
● ├─sys-kernel-debug.mount
● ├─systemd-ask-password-console.path
● ├─systemd-binfmt.service
● ├─systemd-firstboot.service
● ├─systemd-hwdb-update.service
● ├─systemd-journal-catalog-update.service
● ├─systemd-journal-flush.service
● ├─systemd-journald.service
● ├─systemd-machine-id-commit.service
● ├─systemd-modules-load.service
● ├─systemd-pstore.service
● ├─systemd-random-seed.service
● ├─systemd-sysctl.service
● ├─systemd-sysusers.service
● ├─systemd-tmpfiles-setup-dev.service
● ├─systemd-tmpfiles-setup.service
● ├─systemd-udev-trigger.service
● ├─systemd-udevd.service
● ├─systemd-update-done.service
● ├─systemd-update-utmp.service
● ├─cryptsetup.target
● ├─local-fs.target
● │ ├─-.mount
● │ ├─boot.mount
● │ ├─systemd-fstab-generator-reload-targets.service
● │ ├─systemd-remount-fs.service
● │ └─u01.mount
● └─swap.target
● └─dev-mapper-ol\x2dswap.swap
Verfiquei o status do serviço do AFD após o restart, nesse momento encontrei o problema: /bin/chown: cannot access ‘/dev/oracleafd/disks/*’: No such file or directory.
[root@srv-ora01 ~]# systemctl status afd.service
● afd.service - LSB: Start and Stop ASM Filter driver
Loaded: loaded (/etc/rc.d/init.d/afd; generated)
Active: active (exited) since Tue 2025-05-20 21:15:56 -03; 10h ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 819984)
Memory: 0B
CGroup: /system.slice/afd.service
May 20 21:15:55 srv-ora01 afd[2382]: AFD-643: Validating AFD installation files for operating system.
May 20 21:15:55 srv-ora01 afd[2393]: AFD-9393: Verifying ASM administrator setup.
May 20 21:15:55 srv-ora01 afd[2404]: AFD-637: Loading installed AFD drivers.
May 20 21:15:55 srv-ora01 afd[2412]: AFD-9154: Loading 'oracleafd.ko' driver.
May 20 21:15:55 srv-ora01 afd[2481]: AFD-649: Verifying AFD devices.
May 20 21:15:55 srv-ora01 afd[2489]: AFD-9156: Detecting control device '/dev/oracleafd/admin'.
May 20 21:15:56 srv-ora01 afd[2519]: AFD-9294: updating file /etc/sysconfig/oracledrivers.conf
May 20 21:15:56 srv-ora01 afd[2541]: AFD-9322: completed
May 20 21:15:56 srv-ora01 afd[2564]: /bin/chown: cannot access '/dev/oracleafd/disks/*': No such file or directory #AFD error that only used disks could not be found.
May 20 21:15:56 srv-ora01 systemd[1]: Started LSB: Start and Stop ASM Filter driver.
O problema ocorreu devido o CRS ter iniciado antes que o multipath disponibilizasse os discos para uso no ASM. Esta situação já é conhecida e mapeada há bastante tempo, especialmente em ambientes que utilizam o ASMLIB. Inclusive, já publiquei anteriormente informações sobre esse problema com ASMLIB: https://www.cesardba.com.br/asm-nao-monta-o-disk-group-apos-o-reboot-do-servidor/
Após identificar o problema, abri uma SR na Oracle para formalizar a ocorrência, em paralelo, analisei o script de inicialização do AFD e verifiquei que ele possui uma função chamada afd_scandisk()
, responsável por realizar o scan dos discos. Decidi inserir um sleep de 60 segundos dentro dessa função, com o objetivo de garantir tempo suficiente para que o multipath concluísse e disponibilizasse os discos. Com isso em teoria, o ASM/AFD iria reconhecer adequadamente os discos e proceder com a montagem dos disk groups.
[root@srv-ora01 ~]# systemctl status afd.service
● afd.service - LSB: Start and Stop ASM Filter driver
Loaded: loaded (/etc/rc.d/init.d/afd; generated) #<========================= File used to initialize the AFD server.
Active: active (exited) since Tue 2025-05-20 21:15:56 -03; 10h ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 819984)
Memory: 0B
CGroup: /system.slice/afd.service
#Adjustment made to the afd_scandisk() function, sleep 60.
[root@srv-ora01 ~]# cat /etc/rc.d/init.d/afd
afd_scandisk()
{
sleep 60 #<=========================== ADJUSTED SLEEP 60 =====================
if [ ! -r $AFDBOOT ]
then
CMD="/sbin/afdboot -scandisk"
else
CMD="$AFDBOOT -scandisk"
fi
$LOGINFO "Discovering AFD disks ($CMD)"
# Run the command as root to see all devices
$CMD
if [ "$?" != "0" ]
then
$LOGMSG "Failed to scan AFD devices"
else
$LOGINFO "AFD scandisk done."
fi
}
#Reload daemon and restart rv-ora01.
[root@srv-ora01 ~]# systemctl daemon-reload
[root@srv-ora01 ~]# init 6
[root@srv-ora01 ~]# systemctl status afd.service
● afd.service - LSB: Start and Stop ASM Filter driver
Loaded: loaded (/etc/rc.d/init.d/afd; generated)
Active: active (exited) since Wed 2025-05-21 07:39:35 -03; 10s ago
Docs: man:systemd-sysv-generator(8)
Process: 2341 ExecStart=/etc/rc.d/init.d/afd start (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 819984)
Memory: 0B
CGroup: /system.slice/afd.service
May 21 07:38:34 srv-ora01 afd[2373]: AFD-641: Checking for existing AFD installation.
May 21 07:38:34 srv-ora01 afd[2389]: AFD-643: Validating AFD installation files for operating system.
May 21 07:38:34 srv-ora01 afd[2399]: AFD-9393: Verifying ASM administrator setup.
May 21 07:38:34 srv-ora01 afd[2411]: AFD-637: Loading installed AFD drivers.
May 21 07:38:34 srv-ora01 afd[2419]: AFD-9154: Loading 'oracleafd.ko' driver.
May 21 07:38:35 srv-ora01 afd[2488]: AFD-649: Verifying AFD devices. #AFD disks available for use after fix and restart.
May 21 07:38:35 srv-ora01 afd[2496]: AFD-9156: Detecting control device '/dev/oracleafd/admin'.
May 21 07:38:35 srv-ora01 afd[2526]: AFD-9294: updating file /etc/sysconfig/oracledrivers.conf
May 21 07:38:35 srv-ora01 afd[2546]: AFD-9322: completed
May 21 07:39:35 srv-ora01 systemd[1]: Started LSB: Start and Stop ASM Filter driver.
#AFD disks available for use after fix and restart.
[root@srv-ora01 ~]# ls -lart /dev/oracleafd/disks/*
-rw-rw-r-- 1 grid asmadmin 21 May 21 07:39 /dev/oracleafd/disks/IMAGEM01
-rw-rw-r-- 1 grid asmadmin 18 May 21 07:39 /dev/oracleafd/disks/FRA02
-rw-rw-r-- 1 grid asmadmin 18 May 21 07:39 /dev/oracleafd/disks/FRA01
-rw-rw-r-- 1 grid asmadmin 19 May 21 07:39 /dev/oracleafd/disks/DATA04
-rw-rw-r-- 1 grid asmadmin 19 May 21 07:39 /dev/oracleafd/disks/DATA03
-rw-rw-r-- 1 grid asmadmin 19 May 21 07:39 /dev/oracleafd/disks/DATA02
-rw-rw-r-- 1 grid asmadmin 19 May 21 07:39 /dev/oracleafd/disks/DATA01
Após realizar o ajuste na função, reiniciei o servidor e confirmei que o problema foi solucionado. Na SR aberta junto à Oracle, questionei se essa solução seria suportada oficialmente pela Oracle. O engenheiro responsável confirmou que a alteração é suportada pela Oracle e que eu poderia prosseguir com a mudança.
Do not change without approval from Oracle Support.