Tech-eye-Tech

Thursday, April 10, 2014

Netbackup Oracle backup Error 6 On HP-UX Client

One of our Oracle DB backups were failing with error 6 which is a generic error for DB backups using Netbackup and from the error log i have found

Errors in file /OraBase/admin/Test50/bdump/Test50_j000_29761.trc:

ORA-12012: error on auto execute of job 661138

ORA-00257: archiver error. Connect internal only, until freed.

Error:- ORA-00257: archiver error. Connect internal only, until freed.

Solution:-

And i see this is moreover Oracle Side to troubleshoot rather the Netbackup side and after a small research i have found the solution is to move some of the archive logs

--> find the location of Archive destination by
 show parameter archive_dest
 
 lets say it provide  LOCATION=/Test50/oradata/mydb/arch

-->  move some files to some other location using os command
 cd /Test50/oradata/mydb/arch
 mv /Test50/oradata/mydb/arch/* /Test5/oradata/mydb/arch-bkp/
 
If you want to do it from RMAN , try as below.

rman target /

RMAN> backup archive log all format '/Test50/oradata/mydb/arch-bkp';

RMAN> delete archive until time 'trunc(sysdate)';

Error:- ORA-12012: error on auto execute of job 7863452

Solution:-

The underlying table required as part of one of the schedule maintenance tasks has invalid values.
The related table is Test50.Testing_Base.

After taking a look at the job log we see the failed execution attempts:

SQL> select log_date,status from dba_scheduler_job_run_details
where job_name='Test50_DBTesting_JOB';

LOG_DATE STATUS
------------------------------------- ------------------------------
11-FEB-14 10.00.07.3668 AM -06:00 FAILED
04-FEB-14 08.00.05.59665 AM -06:00 FAILED

2 rows selected.

Solution

Check the invalid values on the table and remove them.

SQL> select * from Test50.Testing_Base;

DBID INSTANCE_NAME BASELINE_ID BSLN_GUID TI A STATUS LAST_COMPUT
---------- ---------------- ----------- -------------------------------- -- - ---------------- -----------
4056791929 orcl 0 24T6YH78JNSHDT89ONSNS09Q2NND8728 NW N ACTIVE 140917:2200
4052355478 ORCL3 0 12MDNSIdGAJKW474IETJBESGJH539930 ND Y ACTIVE 141022:2000

As we can see the above orcl is corrupted , Now we proceed to delete it.

The corrupt row can be removed as below.

SQL> DELETE FROM Test50.Testing_Base WHERE INSTANCE_NAME ='orcl';

1 row deleted.

SQL> commit;

Commit complete.

Manually re-execute the job and check the execution log, it must show the job executed successfully. It takes a couple of minutes after execution to show the results in the log table.

SQL> exec dbms_scheduler.run_job('Test50.Testing_Base',false);
PL/SQL procedure successfully completed.

The issue was fixed, validate it by querying

SQL> select log_date,status from dba_scheduler_job_run_details
where job_name='Test50.Testing_Base';

LOG_DATE STATUS
------------------------------------- ------------------------------
14-FEB-14 11.00.07.315077 AM -06:30 FAILED
15-FEB-14 11.00.05.595559 AM -06:30 FAILED
17-FEB-14 03.41.20.714453 Am -06:30 SUCCEEDED

Now try re-run the job it should be successful :)

Thursday, February 27, 2014

Netapp Volume DataMotion Highlights

Today i have performed Netapp DataMotion for volumes to migrate volumes from One Aggregate to another aggregate and i can see DataMotion uses SNAPMIRROR in the back end to move the data from OLD VOLUME to the NEW VOLUME. Please find the brief summary below how it works...

Note :- My volumes has ORACLE DB running on them and i have seen no disruption to the application during DataMotion it went smooth and Hassle Free.

prodfiler4> vol move start production_vol_adm ataggr0 -k

Use " -k " option if you want to keep the source volume as once after the VOL MOVE , Netapp destroys the source volume as all the new reads and writes were directed to the New Volume ( Destination Volume)

prodfiler4> Wed Feb 26 16:14:50 SGT [prodfiler4:vol.move.Start:info]: Move of volume production_vol_adm to aggr ataggr0 started
Creation of volume 'ndm_dstvol_1393402490' with size 21474836480 on containing aggregate
'ataggr0' has completed.

Volume 'ndm_dstvol_1393402490' is now restricted.

Wed Feb 26 16:15:13 SGT [prodfiler4:vol.move.transferStart:info]: Baseline transfer from volume production_vol_adm to ndm_dstvol_1393402490 started.
Transfer started.

Monitor progress with 'snapmirror status' or the snapmirror log.

Wed Feb 26 16:16:54 SGT prodfiler4:vol.move.transferStatus:info]: Baseline transfer from volume production_vol_adm to ndm_dstvol_1393402490 took 97 secs and transferred 1935528 KB data.

Wed Feb 26 16:16:56 SGT [prodfiler4:vol.move.transferStart:info]: Update from volume production_vol_adm to ndm_dstvol_1393402490 started.
Transfer started.

Monitor progress with 'snapmirror status' or the snapmirror log.

Wed Feb 26 16:17:18 SGT [prodfiler4:vol.move.transferStatus:info]: Update from volume production_vol_adm to ndm_dstvol_1393402490 took 13 secs and transferred 1160 KB data.

Wed Feb 26 16:17:23 SGT [prodfiler4:vol.move.transferStart:info]: Update from volume production_vol_adm to ndm_dstvol_1393402490 started.
Transfer started.

Monitor progress with 'snapmirror status' or the snapmirror log.

Wed Feb 26 16:17:44 SGT [prodfiler4:vol.move.transferStatus:info]: Update from volume production_vol_adm to ndm_dstvol_1393402490 took 12 secs and transferred 1104 KB data.

Wed Feb 26 16:17:44 SGT [prodfiler4:vol.move.updateTimePrediction:info]: Expected time for next update from volume production_vol_adm to ndm_dstvol_1393402490 is 12 secs to transfer 272 KB data.

Wed Feb 26 16:17:52 SGT [prodfiler4:vol.move.cutoverStart:info]: Cutover started for vol move of volume production_vol_adm to aggr ataggr0.
Transfer started.

Monitor progress with 'snapmirror status' or the snapmirror log.

prodfiler4> vol move status production_vol_adm
Source Destination CO Attempts CO Time State
production_vol_adm ataggr0 3 60 cutover

prodfiler4> Wed Feb 26 16:18:07 SGT [prodfiler4:vol.move.cutoverEnd:info]: Cutover finished for vol move of volume production_vol_adm to aggregate ataggr0 - time taken 14 secs

prodfiler4> vol move status production_vol_adm
Source Destination CO Attempts CO Time State
production_vol_adm ataggr0 3 60 cutover

prodfiler4> Wed Feb 26 16:18:16 SGT [prodfiler4:wafl.vvol.renamed:info]: Volume 'ndm_dstvol_1393402490' renamed to 'production_vol_adm_old_1393402490'.
'ndm_dstvol_1393402490' renamed to 'production_vol_adm_old_1393402490'
Wed Feb 26 16:18:17 SGT [prodfiler4:vol.move.End:info]: Successfully completed move of volume production_vol_adm to aggr ataggr0.

I have observed one important thing during the DataMotion is that you SOURCE VOLUME should have atleast 10% of FREE_SPACE left in it if not you may face issue during cut_over time as once after the base transfer it creates a snapshot on the Source Volume and do the updates based on the snapshot if you don't have enough space to create snapshots it will just show you transferring updates for more than 5 times which means you have to abort the DataMotion and increase the SOURCE VOLUME size and then start over again.

One more Important thing is the snap autodelete commitment settings on the volume should be set to Try

Ex:- snap autodelete production_vol_adm commitment try ( Set this before initiating DataMotion )

Wednesday, February 26, 2014

"Netapp" How to Convert a 32-Bit Aggregate in to a 64-Bit

Starting from Ontap 8.1 if you want to convert an aggregate of 32-bit to 64-bit all you need is add disks to grow the aggregate over 16TB and it will automatically convert in to 64-bit and it happens in the background (no downtime required ) from 32 to 64 bit. Growing over 16TB is the safest method.

Example :-

For example if you have 14 disks in a 32-Bit aggregate and then you may need to add additional 2 more disks to make it 64 Bit

Syntax:-

Nayab > aggr add aggr_name -64bit -upgrade normal -n 2

This will add 2 disks to the aggregate at the same time it will convert your aggregate to 64-Bit

If you are not able to add disks then there is another alternative can do it through the Diag Mode ( But it is very risky to perform unless directed by a Netapp Personal )

It is your own risk If you want to try this method :-

Nayab > priv set diag

Nayab *> aggr 64bit-upgrade start aggr1 -mode grow-all

Thursday, December 19, 2013

How to change DB instance job schedule for Netapp Snapmanager for SQL Server Busy message

Hi All We have been facing issue with one of our SQL DB instance which is backed up through SNAPMANAGER has been failing with status Snapmanager Server Busy.... So what i have thought is if i change the schedule where no other instances were getting backed up ..... It is always good to have atleast 30mins-1hr between each instance. Please take note that SnapManager for SQL will not follow the windows task scheduler ( Snapmanager for Exchnage uses windows Task Scheduler ). All i have done to avoid the server busy message is to change the schedule as below and it worked perfectly. Now my backups are successful.

Open SQL Server Management Studio

Select the DB Instance for which you want to change the SCHEDULE

Now go to SQL Server Agent àDB_Schedule_Nameà Properties

Select the Job and click Edit

Now you will be able to check and change the scheduled time to run the snapmanager for SQL

Sunday, November 24, 2013

Netbackup Error 830,96,252........ Due to Tape Library IBM 3500 Gripper Failure

We have our backups failing with errors 830, 96, 252, etc... Note that we are using Symantec Netbackup. I was not able to find the cause first thinking it was drives errors as our library was not serviced since years. Later my lead suspected beyond the drive errors then he found the hardware issue.

Gripper probably has a tape stuck in it. I have opened the door and turn the robot towards me so i can see if there are tapes inside and i have discovered a tape was stuck inside the gripper then there's a small mechanism on it, like a plastic rope that you can move with your finger. Pushing it back allows you to get the tape from the gripper. As after removing the tape from the gripper still it didn’t solve my issue then later the IBM engineer came down to our site and replaced the two grippers as they both went faulty. Now my backups are just running fine.

You can find the faulty Gripper in the below pics.

Thursday, October 31, 2013

Netbackup Oracle DB backup failing with error 6

Today we have faced a backup failure of a Oracle DB and later was able to fix the issue following the steps below.

Error
--------
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04004: error from recovery catalog database: ORA-12516: =
TNS:listener could not find available handler with matching protocol =
stack
------
ERROR : ORA-12516: TNS:listener could not find available handler with matching protocol stack

CAUSE : One of the most common reasons for the TNS-12516 and/or TNS-12519 errors being reported is the configured maximum number of PROCESSES and/or SESSIONS limitation being reached. When this occurs, the service handlers for the TNS listener become "Blocked" and no new connections can be made.

SOLUTION :
----------------
1.Check maximum value for processes and sessions using below command :
select name, value from v$parameter where name in (‘processes’,'sessions’);

2.Check current process and session and maximum limitation using below command :
select * from v$resource_limit;

3. Increase your process and session, for example to 300 using below command :
alter system set processes=300 scope=spfile;
alter system set sessions=300 scope=spfile;

4. You need to restart database to take effect.

Tuesday, October 29, 2013

Netapp Additional Information from the version 7.3.1.1 onwards

Right from Data ONTAP 7.3.1.1 additional information was added to the sysconfig command for the Storage Configuration.

Possible System Storage Configurations:

Single-Path All storage in system is single-pathed

Mixed-Path Some storage is single-pathed, some is multi-pathed

Multi-Path All storage in system is multi-pathed

Single-Path HA All storage in this system is single-pathed

Mixed-Path HA Some storage is single-pathed, some is multi-pathed

Multi-Path HA All storage is multi-pathed

Ex:- FAS01> sysconfig

NetApp Release 8.1.1 7-Mode:
System ID: 80093278468 (FAS01); partner ID: 80090494093(FAS02)
System Serial Number: 4000201034 (test1)
System Rev: F6
System Storage Configuration: Multi-Path HA