Thursday, August 29, 2013

How to Build RHEL NFS HA Cluster

This guide will explain how to build an RHEL (CentOS) HA Cluster and export a GFS2 filesytem over NFS.


  • Part I: Build Virtual Servers for HA Cluster
  • Part II: Configure Cluster Nodes
  • Part III: Create Cluster and Fence
  • Part IV: Verify Cluster Node Status on all nodes
  • Part V: Configure LVM, IP, and NFS
  • Part VI: Configure NFS Server
  • Appendix A : Cluster Tests
  • Appendix B: Cluster Commands

Part I: Build Virtual Servers for HA Cluster

1.       Build Cluster Manager Virtual Server with following configuration
a.       Name:  cmgr0
b.      2 CPU
c.       4gb Ram
d.      30gb HDD

2.       Build two cluster nodes for services with base kickstart from Satellite.
a.       1 CPU
b.      4gb RAM
c.       30GB HDD  (1st hdd  scsi0:1)
d.      50gb HDD (shared, 2nd hdd, scsi1:0) 
                                                               i.      See below for shared storage configuration

3.       Run yum update to update all servers, the Cluster Manager and Node Servers
a.       # yum update

4.       Add the server-ha channel and resilient storage channels
a.       # rhn-channel -a -c rhel-x86_64-server-ha-6
b.      # rhn-channel -a -c rhel-x86_64-server-rs-6

5.       Install the High Availability Management Group packages on the Cluster Manager
a.       # yum groupinstall  "High Availability Management"

6.       Start up luci – and verify it will start on reboot.
a.       # service luci start
b.      # chkconfig luci on

7.       Access luci from https://hostname:8084 and login with root user and verify working.

Part II: Configure Cluster Nodes

·         Build server using Satellite and Base Channel – as described above         
·         Edit VMware Settings and perform the following:
a.       Add 2nd SCSI  Controller in VMware , LSI Logic and set SCSI Bus Sharing to Physical
b.      Add 2nd HDD of 50gb, zero eager thick.  See below for shared storage configuration

·         Install HA Group
a.       # yum groupinstall "High Availability"

·         Setup ricci password and chkconfig
a.       chkconfig ricci on
b.      passwd ricci

·         Check status of ricci and start if not running
a.       # service ricci status

·         Get UUID of both nodes. Run from cluster nodes.
a.       First:  create script called ‘getpw’ and chmod a+x.  Add this line to script
·   echo “yourpassword”

·         Run this command and get the output to save for creating ‘Fence’ below.
a.       # fence_vmware_soap --ip -z --action list -l user1 -S /getpw| grep clust_node
·   clust_nodeb,b25f71dd-55db-38cb-88b4-bb5eb5a87b65
·   clust_nodea,c25f71cc-55ac-49bb-88c4-aa5eb5a87b74

Part III: Create Cluster and Fence

1.       Login to lucci web console (HA Managmement console) with root user
·         https://cmgr0:8084

2.       Click on Create and provide a name, use the same password as used for the ricci account on the cluster nodes, add the nodes and select Download Packages. Then select Create Cluster to create the cluster.Click on create cluster and fill in the following information:
·          Cluster Name:          ‘cluster1’
·         Check ‘Use Same Passords’
·         Add both cluster nodes to cluster and enter password
                                                                           i.      Clust_nodea
                                                                         ii.      Clust_nodeb
·         Check “use locally installed packages
·         Check Enable Shared Storage Support
·         Click Create Cluster to create the new cluster.

3.    Verify the /etc/cluster/cluster.conf file on both nodes A and B
·         ]# cd /etc/cluster/
·         # grep cluster1 cluster.conf
                                                  i.    <cluster config_version="111" name="cluster1">

4.       Next Create Fencing by configuring account on VC for Fencing

5.       Log into LUCI and select your cluster and then select the Fence Devices tab.  Select Add.

6.       Fill out the form exactly as shown:
·          Fence type: VMWare (SOAP Interface)
·         Name: nfs fence  (Just needs to be descriptive)
·         IP Address or Hostname:
·         IP Port (optional): blank
·         Login:  CLUST_NODEHA (Service Account)
·         password: P@sswurd
·         Leave the rest as they are or blank
                                                                           i.      Click Submit  to create Fence

7.       Go back to Cluster1 in LUCI and select your first node by clicking on the name. At the bottom of the screen for your node, select Add Fence Method.  Give it a name, such as clust_nodeb-fence   but, the name doesn't matter unless you are planning on using multiple methods.  Submit your change. 

8.       Select Add Fence Instance that appears inside the method box and fill it out exactly as described below:
·         Select the fencing device you configured in above (vmware fencing/nfs fence)
·         VM Name: Leave blank
·         VM UUID: b25f71dd-55db-38cb-88b4-bb5eb5a87b65 (The UUID copied from earlier for Node A)
·         Use SSL: Check it on.  Fencing will not work without this checked.
·         Repeat for Node B

Part IV: Verify Cluster Node Status on all nodes

1.       Run this command on all nodes to verify status
o   # ccs_tool lsnode
Cluster name: cluster1, config_version: 12
Nodename                        Votes Nodeid Fencetype           1    1    vmwarefence           1    2    vmwarefence

2.       Verify the following services are running on cluster nodes
o   # service cman status
§  cluster is running.
o   # service rgmanager status
§  rgmanager (pid  2371) is running...

3.       Check Status of cluster with these two commands:
o   # clustat

§  Cluster Status for cluster1 @ Tue Mar  5 16:10:31 2013
Member Status: Quorate

 Member Name                                           ID   Status
 ------ ----                                           ---- ------                                  1 Online, Local                                  2 Online

o   cman_tool status

  •   Version: 6.2.0
    Config Version: 17
    Cluster Name: cluster1
Cluster Id: 53623
Cluster Member: Yes
Cluster Generation: 40
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 8
Flags: 2node
Ports Bound: 0 177
Node name:
Node ID: 1
Multicast addresses:
                            Node addresses:

Part V: Configure LVM, IP, and NFS

1.       Install gfs2 utils and clvmd on cluster nodes A and B
a.       yum install gfs2-utils      
b.      # yum install  lvm2-cluster
c.       # chkconfig clvmd on

2.       Enable clustering on LVM
a.       # lvmconf --enable-cluster

3.       Create physical volume on /dev/sdb
a.       # fdisk /dev/sdb (create new part, p, 1, t, 8e, w, partprobe)
b.      # pvcreate /dev/sdb1

4.       vgcreate and lvcreate on single node.
a.       # vgcreate -c y vg_gfs2 /dev/sdb1
  Clustered volume group "vg_gfs2" successfully created
b.      ]root@clust_nodea ~]# lvcreate -n lv_home -L 5G vg_gfs2
  Logical volume "lv_home" created

5.       Restart clvmd on each node to sync up the new LV

6.       Create gfs2 filesystem on our lvm vg lv_home
·         # mkfs -t gfs2 -p lock_dlm -t cluster1:gHome -j 4 /dev/mapper/vg_gfs2-lv_home
This will destroy any data on /dev/mapper/vg_gfs2-lv_home.
It appears to contain: symbolic link to `../dm-8'

Are you sure you want to proceed? [y/n] y

Device:                    /dev/mapper/vg_gfs2-lv_home
Blocksize:                 4096
Device Size                5.00 GB (1310720 blocks)
Filesystem Size:           5.00 GB (1310718 blocks)
Journals:                  4
Resource Groups:           20
Locking Protocol:          "lock_dlm"
Lock Table:                "cluster1:gHome"
UUID:                      b25f71dd-55db-38cb-88b4-bb5eb5a87b65

7.       Create new directory in /export for home   (on all nodes)
a.       # cd /export && mkdir home

8.       Add/edit export in /etc/fstab
a.       /dev/mapper/vg_gfs2-lv_home /export/home         gfs2    defaults,noatime,nodiratime  0 0

9.       Mount shared storage and then view (on all nodes)
a.       # mount /dev/mapper/vg_gfs2-lv_home home

b.      mount –v /export

10.   Test writing to file on node A then check on node B ( or wait until mount)
a.       Touch /export/test-file

Part VI: Configure NFS Server

1.       Login  to High Availability Manager and go to cluster1

2.       Click on Resources tab and choose add > then select IP Address

3.       Fill out the following field, leave rest at default, and click Submit
a.       IP Address : enter the VIP IP Assigned,

4.       Click Add again and select files system from the dropdown menu. Add File system to Cluster and click Submit
a.       Name:  cluster1-ha
b.      FS Type: ext4
c.       /export2
d.      /dev/mapper/rhelvg2-exportLV

5.       Click add again and select NFSv3 Export from menu and click Submit
a.       Type – NFSv3 Export
b.      Name – hanfs

6.       Click add again and select NFS Client from menu , add information, then click Submit
a.       NFS Client
b.      Rw,async,root_no_squash

7.       Click on Service Groups, then click on Add Fill out the properties and click Submit
a.       Name:  ha-nfs-service
b.      Click on Add Resource and select the IP Address
c.       Add a Child Resource and Select GFS2  - resource
d.      Add a Child Resource and select NFS export
e.      Add a Child Resource and select NFS Client

8.       Repeat Step 7 c-d to add GFS2 Resource for another share.
a.       Add a Child Resource to existing IP Address and Select GFS2  - resource
b.      Add a Child Resource and select NFS export
c.       Add a Child Resource and select NFS Client

9.       Start the Resource from the Service Groups tab by checking the service and clicking on Start

10.   Check services
a.       # ccs -h clust_nodea --lsservices

NOTE:   need to reconfigure once DNS is setup
Adding following auto-detected host IDs (IP addresses/domain names), corresponding to `cmgr0' address, to the configuration of self-managed certificate `/var/lib/luci/etc/cacert.config' (you can change them by editing `/var/lib/luci/etc/cacert.config', removing the generated certificate `/var/lib/luci/certs/host.pem' and restarting luci):
        (none suitable found, you can still do it manually as mentioned above)

Appendix A : Cluster Tests

Test 1
·         Test by disconnecting network in vCenter for one of the nodes.

Appendix B:  Common Cluster commands

Cluster status
·         # clustat
·         # clustsvcadm

Cluster Various
·         List Cluster Services
a.       #  ccs -h clust_nodeb --lsservices
service: name=ha-nfs-service, recovery=relocate
  ip: ref=
    clusterfs: ref=gHome
      nfsexport: ref=hanfsExport
        nfsclient: ref=nfsClient
    clusterfs: ref=gLogs
      nfsexport: ref=hanfsExport
        nfsclient: ref=nfsClient
  ip: sleeptime=10, address=
  clusterfs: name=gHome, device=/dev/mapper/vg_gfs2-lv_home, mountpoint=/export/home, fsid=8370, fstype=gfs2
  nfsexport: name=hanfsExport
  nfsclient: name=nfsClient, target=, options=rw,async,no_root_squash
  clusterfs: name=gLogs, device=/dev/mapper/vg_gfs2-lv_logs, mountpoint=/export/logs, fsid=59966, fstype=gfs2

·         Print cluster configuration
a.       # ccs -h clust_nodeb –getconf

·         Get cluster version
a.       # ccs -h clust_nodeb –getversion

·         List Fence Devices
a.       # ccs -h clust_nodeb –lsfencedev

·         List Fence Instances

a.       # ccs -h clust_nodeb --lsfenceinst