Scope of the document
Personally I’m not fond of the ASMlib. In the past I was always able to perform my RAC installation without using them.
However, recently, I installed an oracle clusterware on Redhat. With very few time at my disposal I wasn’t able to find a good solution on how not to use them. Time was ticking so I simply went for the well documented solution: installed the ASMlib.
Only recently I read of an elegant solution on the excellent blog of Martin Bach:
http://martincarstenbach.wordpress.com/2010/11/16/configuration-device-m…
Using the multipath of the linux device mapper could have helped me!
Yesterday I was back working on the RAC I previously installed and decided to switch to this solution.
But how to do it without installing the whole cluster?
I found a way and decided to document it in an article.
What’s the purpose of ASMlib?
Well, they should be a way to simplify the administration… but, in my opinion, the real purpose of this component is to show a shared disk with an identical name on all the nodes of you cluster.
Doesn’t linux do it automagically?
No, it doesn’t. And like linux the other unixes.
A shared disk can be shown on a node like /dev/sdd and on another like /dev/sdf.
So you need a way to name them identically on all cluster nodes (it is needed by the ASM).
Luckily any LUN has a unique identifier called WWID. It will be this “key” to solve our problem.
How to do it?
ASM is a way but ASM is not available on all the operating systems.
For example on HP-UX I had to create pseudo devices using the mknod to be sure that every shared disk was correctly named on any cluster node.
Linux had a way to do it via device mapper or raw devices like compiling the old: /etc/sysconfig/rawdevices or the recently deprecated: /etc/udev/permission.d/50-udev.permissions.
Bach in his article showed me of another way to use the multipath.
If you worked with linux and SANs you probably don’t need any explanation.
If you have your unix attached to a SAN you could see the same disk with several names. That’s because any path to that disk is seen as a different disk.
The multipathd help you to deal with path redundancy giving you a pseudo device which represents all the path.
SO?
Now: I was working on a system attached to iSCSI with a single path toward the storage so no need for me to use the multipath.
However, rethinking to my ASMlib installation, I realized that I could use the “alias creation” feature of multipathd to create pseudo device with the same on every node and solving my problem.
Why to bother?
- Because I don’t want an additional layer on my machines,
- because this additional layer could break,
- because a sysadmin who isn’t aware of this layer (yes, there is a documentation but still…) could upgrade the kernel and break my cluster,
- because I trust more the device mapper,
- because the device mapper is more flexible.
But why this article?
There are HOWTO on installing a new system without ASMlib but I could find a paper on how to switch a system who already use ASMlib to one without.
How to do it?
ASM metadata are on the disks so you simply have to show the newly named disk to the ASM to have you volume groups back and so you cluster and the DBs on top of it.
Hands on!!
Time to start.
The basic components of this “migration” are:
- the /etc/multipath.conf
- the asm_diskstring parameter in your ASM.
As stated above any LUN has a unique identifier called WWID.
With the command
/sbin/scsi_id -g -u -s
you can discover this numer.
For example on one of my nodes:
[root@node1 ~]# scsi_id -g -u -s /block/sdd
36006048c63f248cbb9c04ece213872af
In my /dev filesystem I find:
[root@node1 ~]# ll /dev/disk/by-id/scsi-36006048c63f248cbb9c04ece213872af
lrwxrwxrwx 1 root root 9 Jul 26 21:20 /dev/disk/by-id/scsi-36006048c63f248cbb9c04ece213872af -> ../../sdd
We can use this number toc reate more readable special devices via multipathd.
Here is an excerpt from my /etc/multipath.conf. I prefer to have all blacklisted except for the devices I really need:
defaults {
udev_dir /dev
polling_interval 10
selector “round-robin 0”
path_grouping_policy multibus
getuid_callout “/sbin/scsi_id -g -u -s /block/%n”
prio_callout /bin/true
path_checker readsector0
rr_min_io 100
rr_weight priorities
failback immediate
no_path_retry fail
user_friendly_names no
}
blacklist {
devnode “*”
}
blacklist_exceptions {
devnode “^sd[d-z]”
}
multipaths {
multipath {
wwid 36006048c63f248cbb9c04ece213872af
alias ora_registry_mirror_raw_1g
path_grouping_policy failover
uid 551
gid 504
mode 660
}
multipath {
wwid 36006048c655df44cc1d7640365033b7c
alias ora_crs02_raw_1g
path_grouping_policy failover
uid 551
gid 504
mode 660
}
}
Find all the WWID of your system with: for i in `cat /proc/partitions | awk ‘{print $4}’ |grep sd`; do echo “### $i: `scsi_id -g -u -s /block/$i`”; done
Insert all the entry you need in your multipath.conf.
Enable the multipath:
chkconfig multipathd on
Disable the ASMlib:
/etc/init.d/oracleasm disable
chkconfig oracleasm off
Connect to the ASM and change the parameter with:
ALTER SYSTEM SET asm_diskstring=’ORCL:*’,’/dev/mapper/ora*’ SCOPE=BOTH SID=’*’;
Initially you have to keep the old discovery path of ASMlib (ORCL) otherwise you are going to get:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-15014: path ‘/dev/mapper/ora*′ is not in the discovery set
Now you can restart your nodes and the cluster should start without the ASMlib!
If you want to do it without rebooting your nodes you need to:
- stop the asmlib (/etc/oracleasm stop),
- stop your cluster (crsctl stop cluster and crsctl stop crs),
- restart the multipathd (service multipathd restart),
- check if the user permission are correct on the newly create devices in /dev/mapper (ll /dev/mapper/ora*),
- start your cluster (crsctl start cluster and crsctl start crs).
Personally I wasn’t able to do it without a reboot since I got this error in the /var/log/messagges:
Jul 26 17:31:17 node1 kernel: device-mapper: table: 253:6: multipath: error getting device
Jul 26 17:31:17 node1 kernel: device-mapper: ioctl: error adding target to table
This obscure and mysterious error simply means that your device is already opened by another application (probably oracle).
A reboot solves the problem since the device mapper is called before oracle.