蝴蝶 发表于 2015-3-20 15:35:00

Oracle数据库UNDO损坏后的恢复

    UNDO表空间存储着DML操作数据块的前镜像数据,在数据回滚,一致性读,闪回操作,实例恢复的时候都可能用到UNDO表空间中的数据。如果在生产过程中丢失或破坏了UNDO表空间,可能导致某些事务无法回滚,数据库无法恢复到一致性的状态,Oracle实例可能宕机,之后实例无法正常启动;如果有多个UNDO表空间数据文件,丢失其中一个数据文件数据库实例可能不会导致实例宕机,数据库无法干净的关闭(只能SHUTDOWN ABORT),数据库实例能正常的重启,但所有未回滚的数据块依然无法处理,尝试新建UNDO表空间、exp、expdp等操作都会收到ORA-604, ORA-376, and ORA-1110的报错,下面通过一个实际的案例讨论如何处理UNDO损坏后的恢复。

    客户的某系统数据库运行在HP-UX服务器上,数据库版本10.2.0.5,单机数据库,数据库运行在非归档模式,数据库文件存放在裸设备上。由于操作系统某文件系统空间不够,维护工程师准备扩展该文件系统的大小,由于在生产时段的一个误操作(误以为不会影响系统运行的一个操作)导致文件系统不能正常工作,卸载之后无法挂载,不巧的是不知道是谁之前为数据库的UNDO表空间和TEMP表空间都添加了一个数据文件到该文件系统下。这个操作导致某些事务挂起无法回滚,数据库不能正常关闭,但能成功重启,重启后事务依然存在,回滚段显示状态依然ONLINE,尝试新建UNDO表空间收到ORA-604, ORA-376, and ORA-1110,也无法执行exp、expdp操作,告警日志一直在报ORA-604, ORA-376, and ORA-1110的错误。

    出现这种情况首先我们要确定是否有事务受到影响,确定影响范围,之后尝试恢复丢失的数据文件(如果在归档模式利用备份和归档、在线日志恢复丢失的文件),但该客户的数据库运行在非归档模式,无法对数据文件进行恢复操作。备份重于一切,一切工作都要围绕数据安全来开展!

下面是处理过程:
   
    我们的基本思维应该是新建一个UNDO表空间替换现有的UNDO表空间,UNDO表空间数据文件的丢失导致某些事务无法回滚,数据库的某些数据块不一致(可以认为被逻辑损坏),但我们可以接受部分块的损坏,恢复过程需要多次重启数据库实例。

如果你的数据库还能干净的关闭,但在正常情况下无法新建UNDO表空间,那么执行以下的步骤:

I.A. THE DATABASE WAS CLEANLY SHUT DOWN
---------------------------------------

If you are ABSOLUTELY POSITIVE that the database was cleanly shutdown,
i.e., it was closed with either shutdown NORMAL or IMMEDIATE, then
the simplest solution is to offline drop the missing datafile, open the
database in restricted mode, and then drop and recreate the undo
tablespace to which the file belonged.DO NOT follow this procedure
if the database was shut down ABORT or if it crashed.

The steps are:

1. Make sure the database was last cleanly shut down.

   Check the alert.log file for this instance.Go to the bottom of
   the file and make sure the last time you shut the database down
   you got the messages:

      "Shutting down instance (immediate)"

   OR

"alter database close normal
         Completed: alter database close normal"

   This also includes the case of a clean shutdown followed by a
   failed attempt to startup the database.In that case, Oracle will
   issue error messages and shut itself down abort.For the purposes
   of this solution, though, this counts as a clean shutdown.

   If that is not the case, i.e., if the last time YOU shut the database
   down it was in abort mode, or the database crashed itself, it is
   NOT safe to proceed.You should follow the instructions for
   case I.B below.

2. If using automatic UNDO_MANAGEMENT, comment out this entry from the parameter
   file, or set it to MANUAL.

   将UNDO_MANAGEMENT修改为MANUAL是因为UNDO表空间在自动管理模式下,如果不能成功新建回滚段(后面会DROP现有表空间)将导致数据库实例宕机。

   If using rollback segments, remove all the rollback segments in the
   tablespace to which the lost datafile belongs from the ROLLBACK_SEGMENTS
   parameter in the init.ora file for this instance.If you are not sure about which rollbacks are
   in that tablespace, simply comment out the whole ROLLBACK_SEGMENTS entry.

3. Mount the database in restricted mode.

   SQL> STARTUP RESTRICT MOUNT
   以RESTRICT模式启动实例是避免在处理过程中有其他客户端连接。

4. Offline drop the lost datafile.

   SQL> ALTER DATABASE DATAFILE '<full_path_file_name>' OFFLINE DROP;

5. Open the database.

   SQL> ALTER DATABASE OPEN

   You should receive the message "Statement processed,".

   If instead you get ORA-604, ORA-376, and ORA-1110, it is likely the shutdown
   was not normal/immediate.Review the rest of the options available and/or
   contact Oracle Support Services.   

6. Drop the undo tablespace or tablespace which contains rollback segments
   to which the datafile belonged.

   SQL> DROP TABLESPACE <tablespace_name> INCLUDING CONTENTS;

7. Recreate the undo tablespace.If using rollback segments, recreate the
   rollback segment tablespace and all it's rollback segments.Remember to
   bring the rollbacks online after you create them.

SQL> CREATE TABLESPACE UNDOTBS2 DATAFILE <datafile_name> SIZE 1G AUTOEXTEND ON NEXT 100M;

8. Edit the parameter file setting:
   UNDO_MANAGEMENT=AUTO
   UNDO_TABLESPACE=<new undo tablespace name>

   If using rollback segments, reinclude the rollbacks you just recreated in
   the ROLLBACK_SEGMENTS parameter in the init.ora file for this instance.
   As rollback segments were brought online in step #7, no need to proceed
   with shutdown/startup as needed for undo tablespace.All that is required
   is:

   SQL> ALTER SYSTEM DISABLE RESTRICTED SESSION;

如果你的数据库不能正常关闭,只需要在重启数据库实例之前将下面的参数加到参数文件:
_allow_resetlogs_corruption=TRUE
_offline_rollback_segments="_SYSSMU1$"
_offline_rollback_segments="_SYSSMU2$"
_offline_rollback_segments="_SYSSMU3$"
_offline_rollback_segments="_SYSSMU4$"
_offline_rollback_segments="_SYSSMU5$"
_offline_rollback_segments="_SYSSMU6$"
_offline_rollback_segments="_SYSSMU7$"
_offline_rollback_segments="_SYSSMU8$"
_offline_rollback_segments="_SYSSMU9$"
_offline_rollback_segments="_SYSSMU10$"
_corrupted_rollback_segments="_SYSSMU1$"
_corrupted_rollback_segments="_SYSSMU2$"
_corrupted_rollback_segments="_SYSSMU3$"
_corrupted_rollback_segments="_SYSSMU4$"
_corrupted_rollback_segments="_SYSSMU5$"
_corrupted_rollback_segments="_SYSSMU6$"
_corrupted_rollback_segments="_SYSSMU7$"
_corrupted_rollback_segments="_SYSSMU8$"
_corrupted_rollback_segments="_SYSSMU9$"
_corrupted_rollback_segments="_SYSSMU10$"

    rollback_segments的具体值可以从v$rollname中获得。

    处理完成后停止数据库实例,去掉以上参数,修改好UNDO相关参数即可正常启动数据库实例,之后再手动处理TEMP表空间丢失的TEMP数据文件。

    虽然数据库实例能够正常启动,也恢复了UNDO表空间的使用,但这并不代表不一致的块已经恢复,执行某些查询的时候可能会收到报错,数据库完全恢复正常后应该立即执行一次逻辑备份+物理备份,确保数据库的安全。

    另外操作前请仔细阅读《RECOVERING FROM A LOST DATAFILE IN A UNDO TABLESPACE (文档 ID 1013221.6)》文章。

--end--

页: [1]
查看完整版本: Oracle数据库UNDO损坏后的恢复