Monday, November 14, 2022

ExaC@C DB state failed in OCI console while up & running in reality (fix)



Intro

Exadata Cloud@Customer has the particularity of bringing the best of both worlds, where on-premises Data sovereignty meets the innovation & capabilities of the Cloud. Thanks to Control plane network that links up both ExaCC servers and OCI, users can create/manage resources through the Console or any API based cloud tooling (terraform,OCI-CLI, SDK..). Everything you do on the exaC@C is synchronized into OCI through that layer.


Issue of the day

I’ll describe a small glitch that sometimes happens to a database resource. It has no incidence on the database itself, because under EaxC@C, it works just fine. However, you can see the screenshot that databases are marked as failed while they are actually “up and running'”(and accessible) databases. 

+-------------+-----------+------------------------------------+-----------+
| Unique-Name | charset   | id                                 | state     |
+-------------+-----------+------------------------------------+-----------+
| MYCDB1_DOM  | AL32UTF8  | ocid1.database.oc1.ca-toronto-1.xxa|  FAILED   |
+-------------+-----------+------------------------------------+-----------+


State

We need to be mindful of what the state column really means. It’s quite self explanatory after a deployment attempt, but for an existing DB, a state often means database resource is down/up. In our case, however, OCI couldn’t detect the resource anymore, hence the state info shows “FAILED”
But before delving into it, let’s review how ExaCC database resources are seen & registered on OCI side.


Database registration in ExaCC


DB registration allows to perform admin tasks on the exaC@C database through OCI console & Cloud tooling.
Each database created in Exadata Cloud@Customer using API/Console, will automatically be registered in OCI.
Minus few exceptions, where OCI allows for a manual registration which are:
  cases:
   - Database, that you manually created on Exadata Cloud at Customer, using DBCA
   - Existing database, that you migrated from another platform to Exadata Cloud@Customer.
  This is done through dbaascli registerdb function, read more on Registring a Database.

Files created after registration
Each registered database will generate a cloud registration file (DBname.ini) located under the below directory.

$ ll /var/opt/oracle/creg/*ini
MYCDB1.ini


Troubleshooting 

I first decided to check a workaround described below
Doc ID 2764524.1 EXACS DBs Show Wrong State (Failed) on OCI Webconsole

Cause: DBs registered in CRS with dbname in lowercase (dborcl) instead of uppercase (DBORCL).
Suggested solution: Create a symbolic link to creg db ini file to match the case for the db name registered in CRS.

Outcome: This didn’t fix my problem so I opened an SR to get to the bottom of this.  


Diagnosis

This took help from support, as they have a better view on Control plane resources metadata. Taking a look at cloud registration file content, we can see that it contains DB information usually present in the crs plus a few parameters present in the spfile. 

$ more /var/opt/oracle/creg/MYCDB1.ini

#################################################################
# This file is automatically generated by database as a service #
#################################################################
acfs_vol_dir=/var/opt/oracle/dbaas_acfs
acfs_vol_sizegb=10
agentdbid=83112625-52d2-4b39-b987-1b0d7d2d70cb
aloc=/var/opt/oracle/ocde/assistants
archlog=yes
bkup_asm_spfile=+DATA1/MYCDB1_DOM/spfilemycdb1.ora

Agent resource id
Notice the agentdbid in the .ini registration file. Agent resource id, is actually the id that the control plane layer uses to identify & interact with the DB
agentdbid=83112625-52d2-4b39-b987-1b0d7d2d70cb

On top of the registration file, the agent id is also written in a rec file under /var/opt/oracle/dbaas_acfs/<DBNAME>

$ more /var/opt/oracle/dbaas_acfs/MYCDB1/83112625-52d2-4b39-b98xx.rec
{
   "agentdbid" : "83112625-52d2-4b39-b987-1b0d7d2d70cb" }


Root cause

According to OCI support, Somehow the Agent Resource ID seen in Control plane UI console was different than the agentdbid  in the corresponding *.ini file.


Solution

Take note of the agent id communicated by the support engineer & replace the id in the .ini and the .rec file.

  • Take backup of {DBNAME}.ini file of above two dbs on all db nodes

sudo su - oracle
$ cd /var/opt/oracle/creg
$ cp /var/opt/oracle/creg/MYCDB1.ini /var/opt/oracle/creg/MYCDB1.ini.old

  • Modify ID in {DBNAME}.ini file of the DB with the value of Agent Resource ID seen in the support console.

-- Replace agentdbid=  >> by 47098321-43d1-4b44-b997-1b0d5d1d90cb

$ vi /var/opt/oracle/creg/MYCDB1.ini

  • Remove the old rec file with the wrong resourceid and replace it with a new rec file with  the right recid

rm /var/opt/oracle/dbaas_acfs/MYCDB1/83112625-52d2-4b39-b987-1b0d7d2d70cb.rec

$ vi /var/opt/oracle/dbaas_acfs/MYCDB1/47098321-43d1-4b44-b997-1b0d5d1d90cb.rec

{
   "agentdbid" : "47098321-43d1-4b44-b997-1b0d5d1d90cb" << new value }
  • After the the change, wait for an hour or so, for the Control Plane to get in sync and verify DB state

+-------------+-----------+------------------------------------+-----------+
| Unique-Name | charset   | id                                 | state     |
+-------------+-----------+------------------------------------+-----------+
| MYCDB1_DOM  | AL32UTF8  | ocid1.database.oc1.ca-toronto-1.xxa| AVAILABLE |
+-------------+-----------+------------------------------------+-----------+

 

Can we spot the actual agent id in OCI ?

As an end user, you can't see agent resource id in your console. It is unfortunately an internal metadata for control plane. This means, you will have to open an SR each time an issue like this happens. However, I have opened an enhancement request to allow visibility of control plane agentid for end users.



Conclusion

  • We can say that failed database state in OCI console doesn’t always mean the resource is down 
  • It is possible that migrated database from other platform could lead to this phenomenon
  • There is no way as of now for you to know agent resource id that control plane is seeing  
  • Hope control plane metadata like agent resource id  visibility can be achieved in future release
  • Until then this workaround can still help those who face such behaviour

        Thank you for reading

Friday, October 14, 2022

Top picks for Cloud Native & DB sessions at Oracle Cloud World '22

This image has an empty alt attribute; its file name is image.png 

Blabla free post: This is just some of my (personal) top Oracle sessions I will have the chance to attend Next Week. Some of you may agree that there is no way to make a list that fits everyone but for those who are into OCI and database migrations/upgrades this list might help, especially if you haven’t had the time to browse the 3 days Agenda :). Why not spare you some time so you can triple check your packs and confirm you’re all set for the trip ;).
Njoy! and Good luck to my ACE friends  (Rene, Simo, Franck)


Acronyms you should know :
Session Format: In Person/On Demand or Both

Top Session Types:

  • HOL: Hands-on Lab                                         LRN: Learning Session

  • TUT: Tutorial                                                    PAN: Panel Session

  • LIT  : Lightning Session (~20 min)                   DIV : Deep Dive Session

  • CME: Community Meet the Expert Session     
     

My top list


Day 1


OCI platform

☆[TUT4114] OCI Tutorial—Deploying Reference Architectures

This image has an empty alt attribute; its file name is image-1.png

Time: Tuesday, Oct 18 11:00 AM - 1:00 PM PDT                         ROOM: Marco Polo 803, The Venetian, Level 1

                                                          Other Date: Thursday AM [TUT4859]

☆[HOL4318] Oracle Cloud Infrastructure Identity and Access Management Identity Domains

This image has an empty alt attribute; its file name is image-2.png

Time: Tuesday, Oct 18 1:00 PM - 2:30 PM PDT                         ROOM: Titian 2302, The Venetian, Level 2

Other Date: Wednesday PM [LRN3874]


☆[LRN3708] Running a Containerized, PostgreSQL-Compatible Database in the Cloud

This image has an empty alt attribute; its file name is image-16.png

Time: Tuesday, Oct 18 12:15 PM - 1:00 PM PDT                         ROOM: Summit 215, Caesars Forum


Database

☆[LRN1507] Database Upgrade and Migration Best Practices—Meet the Experts

This image has an empty alt attribute; its file name is image-3.pngTime: Tuesday, Oct 18 11:00 AM - 11:45 AM PDT                 ROOM: Murano 3304, The Venetian, Level 3

☆[HOL3999] Hitchhiker's Guide for Upgrading to Oracle Database 19c

This image has an empty alt attribute; its file name is image-4.pngTime: Tuesday, Oct 18 4:00 PM - 5:30 AM PDT                 ROOM: Bellini 2002, The Venetian, Level 2
Other Date: Wednesday AM

☆[LRN3672] Cloud, Databases, and Automation

This image has an empty alt attribute; its file name is image-17.pngTime: Tuesday, Oct 18 11:00 AM - 11:45 AM PDT                 ROOM: Summit 215, Caesars Forum


☆[LIT4101] Upgrade to Oracle Database 19c

This image has an empty alt attribute; its file name is image-5.pngTime: Tuesday, Oct 18 4:00 PM - 4:20 PM PDT                 Lounge: Ascend, CloudWorld Hub, The Venetian



Day 2


OCI platform

☆[DIV4532] Oracle Cloud Infrastructure Networking Deep Dive

This image has an empty alt attribute; its file name is image-10.png

Time: Wednesday, Oct 19 9:00 AM - 11:00 PM PDT              ROOM: Marco Polo 805 The Venetian, Level 1

☆[TUT4112] OCI Tutorial—OCI Networking and Security Quick Start

This image has an empty alt attribute; its file name is image-11.png

Time: Wednesday, Oct 19 9:00 AM - 11:00 PM PDT              ROOM: Marco Polo 805 The Venetian, Level 1

                                        

☆[HOL4306] Introduction to Oracle Resource Manager and Terraform

This image has an empty alt attribute; its file name is image-13.png

Time: Wednesday, Oct 19 9:00 AM - 10:30 PM PDT              ROOM: Marco Polo 801, The Venetian, Level 1

☆[TUT4110] OCI Tutorial—OCI Technical Quick Start

This image has an empty alt attribute; its file name is image-7.png

Time: Wednesday, Oct 19 11:00 AM - 1:30 PM PDT              ROOM: Marco Polo 801, The Venetian, Level 1

                                                          Other Date: Tuesday AM [11:00]


☆[LRN3849] Getting Started with Oracle Cloud Infrastructure Storage Services

This image has an empty alt attribute; its file name is image-2.png

Time: Tuesday, Oct 18 1:00 PM - 2:30 PM PDT                      ROOM: Titian 2302, The Venetian, Level 2

Other Date: Wednesday PM [LRN3874]


☆[LRN3848] Getting Started with Oracle Cloud Infrastructure Security Services

This image has an empty alt attribute; its file name is image-14.png

Time: Wednesday, Oct 19   1:15 PM - 2:00 PM PDT            ROOM: Lido 3005, The Venetian, Level 3


☆[LRN3874] Getting Started with OCI Identity and Access Management

This image has an empty alt attribute; its file name is image-9.png

Time: Wednesday, Oct 19   5:00 PM - 5:45 PM PDT            ROOM: Lido 3005, The Venetian, Level 3

Other Date: Tuesday PM [HOL4318]


Database

☆[LRN3503] Cloud Database Migrations the Easy Way

This image has an empty alt attribute; its file name is image-15.pngTime: Wednesday, Oct 19 1:15 PM - 2:00 PM PDT                 ROOM: Murano 3203, The Venetian, Level 3

☆[HOL3999] Hitchhiker's Guide for Upgrading to Oracle Database 19c

This image has an empty alt attribute; its file name is image-4.pngTime: Tuesday, Oct 18 09:00 AM - 11:45 AM PDT                 ROOM: Titian 2201A, The Venetian, Level 2
Other Date: Tuesday AM


Day 3

OCI platform

☆[TUT4116] OCI Tutorial—You're in the Cloud, Now What?

This image has an empty alt attribute; its file name is image-18.png

Time: Thursday, Oct 20 9:00 AM - 11:00 AM PDT              ROOM: Marco Polo 803, The Venetian, Level 1

☆[TUT4860] Oracle Cloud Infrastructure Tutorial—Putting OCI Best Practices into Practice

This image has an empty alt attribute; its file name is image-19.png

Time: Thursday, Oct 20 11:30 AM - 1:30 PM PDT              ROOM: Marco Polo 802, The Venetian, Level 1

☆[DIV4534] Oracle Cloud Infrastructure DevOps Deep Dive

This image has an empty alt attribute; its file name is image-20.png

Time: Thursday, Oct 20 11:30 AM - 1:30 PM PDT              ROOM: Marco Polo 805, The Venetian, Level 1

☆[LRN3640] Automate Infrastructure as Code and Configuration Management

This image has an empty alt attribute; its file name is image-21.png

Time: Thursday, Oct 20 11:30 AM - 12:15 PM PDT              ROOM: Summit 213, Caesars Forum



Database

☆[HOL4093] No Slide Zone—Database Patching Insights, Parts 1 and 2

This image has an empty alt attribute; its file name is image-22.pngTime: Thursday, Oct 20 9:00-10:30 AM -11:00-12:30 PM PDT       ROOM: Bellini 2001A , The Venetian, Level 2

☆[LRN3500] AutoUpgrade 2.0: Internals and New Features

This image has an empty alt attribute; its file name is image-23.pngTime: Thursday, Oct 20 1:15 PM - 2:00 PM PDT                 ROOM: Murano 3202, The Venetian, Level 3


☆[LRN3135] Migrating 200+ PDBs from On-Premises Exadata to ExaCC Using ZDM

This image has an empty alt attribute; its file name is image-25.pngTime: Thursday, Oct 20 11:30 AM - 12:15 PM PDT                 ROOM: San Polo 3506, The Venetian, Level 3


☆[LRN3501] Oracle Data Pump Deep Dive with Development

This image has an empty alt attribute; its file name is image-26.pngTime: Thursday, Oct 20 11:30 AM - 12:15 PM PDT                 ROOM: Murano 3202, The Venetian, Level 3

☆[LRN3549] Automate Exadata Database Service with APIs, SDKs, Ansible, and Terraform

This image has an empty alt attribute; its file name is image-27.pngTime: Thursday, Oct 20 3:45 PM - 4:30 PM PDT                 ROOM: San Polo 3403, The Venetian, Level 3




゚☆ See you there ☆゚