Sunday, June 9, 2019

Enjoy and experiment RAC Features



Introduction:

The goal of this section is to explore some of the Oracle Clusterware feaures as node fencing, callouts, as well as working with services and Policy managed servers . To know more about the cluster lab used in this article  please have a look on my previous RACattack on Redhat 7 blog post  >> Deploying Oracle RAC Database 12c Lab on Red Enterprise Linux 7   


     Content :

  1. Configure Oracle Client
  2. Node Fencing
  3. Clusterware Callouts
  4. Service Failover
  5. Policy Managed Database

    1. Configure Oracle Client

    To connect from a client outside the cluster Vms (virtualBox) we will need to download and install Oracle's Basic Instant Client

    1. Download Oracle's Basic (English-only) Instant Client and Oracle's Instant Client SQLPlus package from Oracle's website :  
    (instantclient-basiclite-windows.x64-12*zip and instantclient-sqlplus-windows.x64-12*zip)


    2. Each archive contains a folder named "instantclient_12_1". Extract this folder (from both archives) into your drive (I.e  C:\ ).

    3. Open Windows Network Settings (Control Panel –> Network and Internet -> Network and Sharing Center ) and select Change adapter settings.
    - Double Click on VirtualBox Host-Only Network  adaptor



    4. Click on Properties then Select Internet Protocol Version 4 (TCP/IPv4) and click Properties
      • Preferred DNS server: 192.178.78.51
      • Alternate DNS server: 192.178.78.52
    5. Select Use the following DNS server addresses and enter the addresses of the two virtual machines:
      
    6. Click on the Advanced section and select the DNS tab.
    • Check Append these DNS suffixes (in order) option:
    • Click Add... and enter evilcorp.com in the Domain suffix field that appears.
    • Click Add

      
    7. Check both Register the connection's addresses in DNS and Use this connection's DNS suffix in 
         DNS registration
    .
    • Click OK and/or Close until you exit from the Network configuration.

    8. Open a Command prompt and check that the DNS resolution is working properly on your laptop:
    C:\Users\Ironman> nslookup - 192.168.78.51
    Default Server:  london1.evilcorp.com
    Address:  192.168.78.51
    
    > london-cluster-scan
    Server:  london1.evilcorp.com
    Address:  192.168.78.51
    
    Name:    london-cluster-scan.evilcorp.com
    Addresses:192.168.78.253
              192.168.78.252
              192.168.78.251
    9. Test your Instant Client installation by connecting to the database.
        • DB access:
    C:\instantclient_12_1> sqlplus system/racattack@//london-cluster-scan/RAC_DB.evilcorp.com
    
    SQL*Plus: Release 12.2.0.1.0 Production on Mon Jul 2 18:54:02 2018
    Copyright (c) 1982, 2016, Oracle.  All rights reserved.
    
    Connected to:
    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
    With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
    Advanced Analytics and Real Application Testing options
    
    SQL> select instance_name from v$instance;
    
    INSTANCE_NAME
    ----------------
    RACDB1
        • Pluggable DB access:
    SQL>C:\instantclient_12_1> sys/racattack@//london-cluster-scan/RAC_DB.evilcorp.com as sysdba 

    SQL> alter pluggable database PDB open;
    SQL> alter pluggable database PDB save state;
    SQL> connect system/racattack@//london-cluster-scan/PDB.evilcorp.com Connected.
    10. Create a TNSNAMES file and copy the TNS RAC entries from either nodes (PDB entry included) and run a tnsping
    C:\instantclient_12_1> notepad C:\instantclient_12_1\tnsnames.ora
     
    RAC =
      (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = london-cluster-scan.evilcorp.com)(PORT = 1521))
       (CONNECT_DATA =(SERVER = DEDICATED)
          (SERVICE_NAME = RAC_DB.evilcorp.com)) )
    PDB =
      (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = london-cluster-scan.evilcorp.com)(PORT = 1521))
        (CONNECT_DATA = (SERVER = DEDICATED)
          (SERVICE_NAME = PDB.evilcorp.com)))
              • Test
            C:\instantclient_12_1> set TNS_ADMIN=c:\instantclient_12_1
            C:\instantclient_12_1> sqlplus system/racattack@RAC
            SQL*Plus: Release 12.2.0.1.0 Production on Mon Jul 2 19:51:55 2018
            
            Copyright (c) 1982, 2016, Oracle.  All rights reserved.
            Last Successful login time: Mon Jul 02 2018 19:18:54 -04:00
            
            Connected to:
            Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
            With the Partitioning, Real Application Clusters, Automatic Storage Management,OLAP,Advanced Analytics and Real Application Testing options
            SQL> connect system/racattack@PDB
            Connected.

            1. Node fencing                                                              Go to Top

                 Fencing is handled at the node level by rebooting the non-responsive or failed node.  

              1. Monitor the clusterware log files on both nodes. On each node, start a new window and run the following command (log files and their locations have changed in 12.1.0.2)
              [grid@<node_name>]$tail –f /u01/app/grid/diag/crs/`hostname -s`/crs/trace/crsd.trc
              [grid@<node_name>]$tail –f /u01/app/grid/diag/crs/`hostname -s`/crs/trace/ocssd.trc
              -- Or
              [grid@<node_name]$ tail –f /u01/app/grid/diag/crs/`hostname -s`/crs/trace/alert.log
              2. We will simulate “unplugging” the network interface by taking one of the private network interfaces down. On the london2 node, turn the private network interface down by running the following as root.

              [grid@[london2] ifconfig eth1 down 

                3. We can now start monitoring those log files from step2 and track eventual errors. We will eventually observe that one of the nodes reboots itself.

                .......
                London1
                .......

                2018-07-03 00:36:32.665 [OCSSD(2736)]CRS-1610: Network communication with node london2 (2) missing for 90% of timeout interval. 
                Removal of this node from cluster in 2.680 seconds
                2018-07-03 00:36:40.122 [OCSSD(2736)]CRS-1625: Node london2, number 2, was shut down
                2018-07-03 00:36:40.122 [OCSSD(2736)]CRS-1607: Node london2 is being evicted in cluster incarnation 425999537;
                2018-07-03 00:36:50.122 [OCSSD(6016)]CRS-1601: CSSD Reconfiguration complete. Active nodes are london1 .
                2018-07-03 00:36:51.122 [OCTSSD(8024)]CRS-2407: The new Cluster Time Synchronization Service reference node is host london1.
                2018-07-03 00:36:51.2 [CRSD(9680)]CRS-5504: Node down event reported for node 'london2'.
                2018-07-03 00:37:10.122 [CRSD(9680)]CRS-2773:
                Server 'london2' has been removed from pool 'Generic'.
                2018-07-03 00:37:10.122 [CRSD(21675)]CRS-2772: Server 'london2' has been assigned to pool 'ora.RAC_DB'.
                2018-07-03 00:37:40.122 [OCSSD(6016)]CRS-1601:
                CSSD Reconfiguration complete. Active nodes are london1 london2 .
                2018-07-03 00:38:40.122 [CRSD(9680)]CRS-2772: Server 'london2' has been assigned to pool 'Generic'.

                .........
                london2 - after node reboot
                .........

                [grid@london2 ~]$ nmcli dev status
                Unknown parameter: eth0

                DEVICE  TYPE      STATE      CONNECTION
                ------- -------- ----------- -----------
                eth0    ethernet  connected  eth0
                eth1    ethernet  connected  eth1
                eth2    ethernet  connected  eth2

                [grid@london2 ~]$ srvctl status database -d rac_db
                Instance RACDB1 is not running on node london1
                Instance RACDB2 is running on node london2
                [grid@london1 ~]$ srvctl status nodeapps
                VIP london1-vip.evilcorp.com is running on node: london1
                VIP london2-vip.evilcorp.com is running on node:
                london2
                Network is running on node: london1
                Network is running on node:
                london2
                ONS daemon is running on node: london1
                ONS daemon is running on node:
                london2 

                1. Clusterware callout                                                    Go to Top

                Description: This feature is a capability for Oracle Clusterware to fire a script (or a whole directory full of them) to perform any sort of tasks required when a cluster-wide event happens (for example database is up, down, or unresponsive).
                For this exercise, we will configure few FAN callout scripts on each node and then generate various cluster events to see how each member triggers the callout script.

                1. On an up and running cluster open a shell prompt (as oracle) on each node, navigate to $GRID_HOME/racg/usrco. Create our callout script called london_callout.sh using vi . The content should look like this:
                #!/bin/ksh
                umask 022
                FAN_LOGFILE=/tmp/`hostname`_uptime.log
                echo $* "reported="`date` >> $FAN_LOGFILE &  
                
                2. Make sure permissions on the file are set to 755 using the following command
                [oracle@<node_name>]$ chmod 755 /u01/grid/oracle/product/12.1.0.2/grid/racg/usrco/london_callout.sh 
                
                3. Monitor the log files for clusterware on each node. On each node, start a new window and run the following command:
                [oracle@<node_name>]$ tail –f /u01/app/grid/diag/crs/`hostname -s`/crs/trace/alert.log 

                [root@london2 ~]# crsctl stop crs CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'london2' CRS-2673: Attempting to stop 'ora.crsd' on 'london2' CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'london2' CRS-2673: Attempting to stop 'ora.rac.db' on 'london2' CRS-2673: Attempting to stop 'ora.DATA.SHARED.advm' on 'london2' CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'london2' ... CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'london2' has completed CRS-4133: Oracle High Availability Services has been stopped
                4. Next, we need to trigger an event that will cause the callout to fire. One such event is node shutdown. Shutdown the clusterware on node london2.
                5. Following the shutdown, watch the log files you began monitoring in step 1 above. Because we set long timeouts on our test cluster, you might have to wait for a few minutes before you see anything.
                • expected result: Both members run the clusterware callout script.  
                You should see these messages in the /tmp/*.log files indicating that the node is down:
                NODE VERSION=1.0 host=london2 status=nodedown reason=public_nw_down incarn=0 timestamp=2018-07-08 16:21:40 timezone=-04:00 vip_ips=192.168.78.62 reported=Sun Jul 8 16:21:40 EDT 2018
                
                INSTANCE VERSION=1.0 service=rac_db.evilcorp.com database=rac_db instance=RACDB2 host=london2 status=down reason=USER timestamp=2018-07-08 16:21:41 timezone=-04:00 
                db_domain=evilcorp.com reported=Sun Jul 8 16:21:40 EDT 2018
                Note: I also had the database service shut down on london1 then back up but I suspect it's due to memory resource starvation on my vms
                NODE VERSION=1.0 host=london2 status=nodedown reason=member_leave incarn=426435207 timestamp=2018-07-08 16:38:13 timezone=-04:00 reported=Sun Jul 8 16:38:15 EDT 2018
                NODE VERSION=1.0 host=london2 incarn=426435207 status=down reason=member_leave timestamp=08-Jul-2018 16:38:13 reported=Sun Jul 8 16:38:15 EDT 2018
                6. Restart the clusterware.
                [root@london2 bin]# crsctl start crs
                • expected result: Only the failed node will run the clusterware callout script.  

                [root@london2 usrco]#  cat  /tmp/`hostname`_uptime.logs
                INSTANCE VERSION=1.0 service=rac_db.evilcorp.com database=rac_db instance=RACDB2 host=london2 status=up reason=BOOT timestamp=2018-07-08 17:35:16 timezone=-04:00 db_domain=evilcorp.com reported=Sun Jul 8 17:35:17 EDT 2018

                7. Let's trigger another event which is an unexpected  database instance shut down. Only the node london2 will run the callout script.
                [oracle@london2 ~]$ sqlplus "/ as sysdba"
                Connected to:
                Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
                SQL>  shutdown abort;
                -
                
                [grid@london2 ~]$ cat /tmp/`hostname`_uptime.log
                
                INSTANCE VERSION=1.0 service=rac_db.evilcorp.com database=rac_db instance=RACDB2 host=london2 status=down reason=USER timestamp=2018-07-08 17:55:37 timezone=-04:00 db_domain=evilcorp.com reported=Sun Jul 8 17:55:37 EDT 2018
                
                • Other possible tests : Powering off one of the virtual machines ;disabling a network interface ..etc

                  1. Service Failover                                                    Go to Top

                  1. Login to london1 as the oracle user. Create a new service svctest with RACDB1 as a preferred instance and RACDB2 as an available instance. This means that it will normally run on the RACDB1 instance but will failover to the RACDB2 instance if RACDB1 becomes unavailable. If you haven't created a container database, omit the -pdb parameter.
                  [oracle@london1]$ srvctl add service -d RAC_DB -pdb PDB -s svctest -r RACDB1 -a RACDB2 -P BASIC
                  [oracle@london1]$ srvctl start service -d RAC_DB -s svctest
                  2. Examine where the service is running by checking lsnrctl on both nodes and looking at the SERVICE_NAMES init parameter on both nodes.
                  [oracle@london1]$ srvctl status service -d RAC_DB -s svctest
                  Service svctest is running on instance(s) RACDB1
                  
                  [oracle@london1]$ lsnrctl services
                  ...
                  Service "svctest.evilcorp.com" has 1 instance(s).
                    Instance "RACDB1", status READY, has 1 handler(s) for this service...
                      Handler(s):
                        "DEDICATED" established:1 refused:0 state:ready
                           LOCAL SERVER
                  
                  [oracle@london2]$ lsnrctl services
                   ---- svctest not found in the list
                  SQL> select name, pdb, inst_id from gv$services;
                  
                  NAME                               PDB                              INST_ID
                  ---------------------------------- ------------------------------ ---------
                  svctest                            PDB                                  1
                  RAC_DB.evilcorp.com                CDB$ROOT                             1
                  pdb.evilcorp.com                   PDB                                  1
                  pdb.evilcorp.com                   PDB                                  2
                  RAC_DB.evilcorp.com                CDB$ROOT                             2
                  
                  3. Use SHUTDOWN ABORT to kill the instance where service svctest is running.
                  SQL> Show user
                  USER is "SYS"
                  SQL> select instance_name from v$instance;
                   
                  INSTANCE_NAME
                  ----------------
                  RACDB1
                   
                  SQL> shutdown abort;
                  ORACLE instance shut down.
                  SQL>
                  
                  4. Wait a few moments and then repeat step 2. The service stops running.
                  [oracle@london1]$ srvctl status service -d RAC_DB -s svctest
                  Service svctest is not running.
                  
                  5. Restart the instance that you killed.
                  [oracle@london1]$ srvctl status database -d RAC_DB
                  Instance RACDB1 is not running on node london1
                  Instance RACDB2 is running on node london2
                  [oracle@london1]$ srvctl start instance -d RAC_DB -i RACDB1
                  [oracle@london1]$ srvctl status database -d RAC_DB
                  Instance RACDB1 is running on node london1
                  Instance RACDB2 is running on node london2
                  
                  6. Repeat step 2. the service is now running back in RACDB1
                  [oracle@london1]$ srvctl status service -d RAC_DB -s svctest
                  Service svctest is running on instance(s) RACDB1
                  
                  7. Manually failover the service. Confirm where it is now running. Note that this does not disconnect any current sessions
                  [oracle@london1]$ srvctl relocate service -d RAC_DB -s svctest -i RACDB2 -t RACDB1
                  [oracle@london1]$ srvctl status service -d RAC_DB -s svctest
                  Service svctest is running on instance(s) RACDB1

                  1. Policy managed database                                         Go to Top

                  Overview: Two deployment models exist for designing Oracle RAC and Oracle RAC One Node databases:

                  • Administrator-managed : Only available model up to Oracle 11.2. It requires static configuration of database instance to run on a specific node, as well as a database services to run on specific instances belonging to a certain database using the preferred and available designation.
                  • Administrator-managed: Relies on server pools, where database services run within a server pool as singleton(active on only one instance in the server pool) or uniform(active on all instances in the server pool) across all of the servers in the server pool.

                  Serverpools:

                  are logical partitions of a larger physical cluster composed of groups of servers (Server Pool). Each pool have three main properties:  IMPORTANCE (0-1000), MIN_SIZE(default 0), MAX_SIZE(-1 cluster-wide) .  

                  Note: I wish I had more nodes in my cluster to provide a more exhaustive preview of the feature, alas my laptop memory is already agonizing. Thus my examples will rely on a 2 node cluster.    

                  1. First let’s Create and Manage a new Server Pool using either crsctl or srvctl

                  # crsctl add serverpool srvpl01 -attr MIN_SIZE=0 ,MAX_SIZE=2, IMPORTANCE=0 SERVERNAMES=london1 london2 
                    -- OR
                  # srvctl add srvpool -serverpool srvpl01 -min 1 -max 4 -servers london1,london2

                  2. Assign the RAC_DB Database to our new serverpool

                  - Stop the database
                  # srvctl stop database -d RAC_DB

                  - Modify and assign it to a serverpool
                  # srvctl modify database -d RAC_DB -serverpool srvpl01 

                  - Restart the database for the modification to take effect
                  # srvctl start database -d RAC_DB

                  3. Verify our new configuration

                  - Check the configuration of the newly created serverpool
                  # srvctl  config serverpool -serverpool srvpl01
                  Server pool name: srvpl01
                  Importance: 0, Min: 0, Max: 4
                  Category:
                  Candidate server names: london1,london2

                  # crsctl status serverpool ora.srvpl01 -f
                  NAME=ora.srvpl01
                  IMPORTANCE=0
                  MIN_SIZE=0
                  MAX_SIZE=4
                  SERVER_NAMES=london1 london2
                  PARENT_POOLS=
                  EXCLUSIVE_POOLS=
                  ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
                  SERVER_CATEGORY=
                  ACTIVE_SERVERS=london1 london2
                  - verify the association of the database to the serverpool

                  [root@london2 ~]# srvctl config database -db RAC_DB
                  Database unique name: RAC_DB
                  Database name: RAC_DB
                  Oracle home: /u01/app/oracle/product/12.1.0.2/db_1
                  Oracle user: oracle
                  Spfile: +DATA/RAC_DB/PARAMETERFILE/spfile.275.975538943
                  Password file: +DATA/RAC_DB/PASSWORD/pwdrac_db.284.975537953
                  Domain: evilcorp.com
                  Start options: open
                  Stop options: immediate
                  Database role: PRIMARY
                  Management policy: AUTOMATIC
                  Server pools: srvpl01
                  Disk Groups: FRA,DATA
                  Mount point paths:
                  Services: svctest
                  Type: RAC
                  ... Database is policy managed

                  Post conversion changes :

                  -  Instances name will slightly change after switching to policy management. The inst numbers  are preceded with an underscore (racdb1=>racdb_1) which will require you to change the TNS entries accordingly.
                  If the passwordfile is stored in a file system you’ll have to change its name as well

                  # mv orapwdDBNAME1 orapwDBNAME_1

                  - OEM: If Database was registered with OEM cloud 12c. Change existing instances name with new ones (racdb_1)

                  Summary and Advantages of policy managed Databases :

                  - Resources defined in advance to cope with workload needs
                  - Breaks explicit association between services,instances, and nodes
                  - Sufficient instances are automatically started to satisfy the service to serverpool association defined by the cardinality (uniform and singleton) instead of the usual admin managed options ( preferred  / available)

                   Further examples: The below figure is yet another use case where policy managed databases happen to share two server pools having different priorities allowing consolidation and  CPU caging.
                  If you want to learn more about it you can check out the excellent post from Ludovico Caldara 

                                   

                                                                                                                                                                                         Go to Top>>

                  No comments:

                  Post a Comment