RAC 10G集群启动脚本
11GR2版本GI中新增加的重要组件OHAS(Oracle High Availability Service)和其他相关的组件,资源,下图是11GR2版本中GI组件之间启动关系。
OHAS
OHAS是11GR2版本新推出的一个重要的组件,随着这个组件的产生,Oracle集群管理软件很多方面发生了改变。这些改变主要体现在集群启动方式和资源管理方式方面。
集群启动方式10G版本
10G版本集群管理软件(CRS)。从集群的启动角度来说,10G版本的集群通过/etc/inittab文件中下面标红的三行代码来启动。数据库版本Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
# cat /etc/inittab
ap::sysinit:/sbin/autopush -f /etc/iu.ap
sp::sysinit:/sbin/soconfig -f /etc/sock2path
smf::sysinit:/lib/svc/bin/svc.startd >/dev/msglog 2<>/dev/msglog </dev/console
p3:s1234:powerfail:/usr/sbin/shutdown -y -i5 -g0 >/dev/msglog 2<>/dev/msglog
h1:3:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:3:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:3:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null
虽然以上三个脚本是同时被调用的,但是守护进程之间是有依存关系的。首先需要启动cssd.bin并确保其能够正常工作,之后才能够启动crsd.bin并确保其正常工作,最后启动evmd.bin并确保其正常工作。
Init.cssd:负责启动ocssd.bin守护进程和其他css层面的守护进程,从而完成对集群的构建工作。
Init.crsd:负责启动crsd.bin守护进程并且调用racg模块来启动相应的资源,从而完成对集群应用程序资源的启动。
Init.evmd:负责启动evmd.bin守护进程,从而实现集群节点的事件发布。
[oracle@webdb1 ~]$ ls -l /etc/inittab
-rw-r--r-- 1 root root 1869 Jan 23 2013 /etc/inittab
[oracle@webdb1 ~]$ ls -l /etc/init.d/init.cssd
-r-xr-xr-x 1 root root 55166 Jan 23 2013 /etc/init.d/init.cssd
接下来,看一下每个脚本的内容,只列举一部分脚本,主要体现主要功能。
(1)init.crsd脚本
...............................................................................................................
ORA_CRS_HOME=/opt/oracle/product/CRS
ORACLE_USER=oracle
ORACLE_HOME=$ORA_CRS_HOME
export ORACLE_HOME
export ORA_CRS_HOME
export ORACLE_USER
# Set DISABLE_OPROCD to false. Platforms that do not ship an oprocd
# binary should override this below.
DISABLE_OPROCD=false
# Default OPROCD timeout values defined here, so that it can be
# over-ridden as needed by a platform.
# default Timout of 1000 ms and a margin of 500ms
OPROCD_DEFAULT_TIMEOUT=1000
OPROCD_DEFAULT_MARGIN=500
# default Timeout for other actions
OPROCD_CHECK_TIMEOUT=2000
OPROCD_STOP_TIMEOUT=2000
OPROCD_DEFAULT_HISTORGRAM=
# Incase /bin/hostname is not present in a particular platform, we
# may have to do something different.
HOSTN=/bin/hostname
EXPRN=/usr/bin/expr
CUT=/usr/bin/cut
AWK='/bin/awk'
ECHO='echo'
TR=/bin/tr
#solaris on amd and SPARC has issue with /bin/tr
[ 'SunOS' = `/bin/uname` ] && TR=/usr/xpg4/bin/tr
#on Linux tr is at /usr/bin/tr
[ 'Linux' = `/bin/uname` ] && TR=/usr/bin/tr
#If the hostname is an IP address, let hostname
#remain as IP address
HOST=`$HOSTN`
len1=`$EXPRN "$HOST" : '.*'`
len2=`$EXPRN match $HOST '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'`
# Strip off domain name in case /bin/hostname returns
# FQDN hostname
if [ $len1 != $len2 ]; then
HOST=`$ECHO $HOST | $CUT -d'.' -f1 `
fi
HOST=`$ECHO $HOST | $TR '[:upper:]' '[:lower:]'`
# Default Location for commands on most platforms
PS='/bin/ps'
# ps -e is expected to search for all processes on the box and provide
# terse binary name output so that column count does not truncate binary
# names and confuse grep.
PSE='/bin/ps -e'
PSEF='/bin/ps -ef'
HEAD='/bin/head'
GREP='/bin/grep'
KILL='/bin/kill'
KILLTERM='/bin/kill -TERM'
KILLDIE='/bin/kill -9'
KILLCHECK="/bin/kill -0 $$"
SLEEP='/bin/sleep'
NULL='/dev/null'
............................................................可以看到,首先定义了集群使用的一些环境变量和需要使用的操作系统命令。
...............................................................................................................
PLATFORM=`$UNAME`
MAXFILE=65536
case $PLATFORM in
Linux)
LD_LIBRARY_PATH=$ORA_CRS_HOME/lib
export LD_LIBRARY_PATH
FAST_REBOOT="/sbin/reboot -n -f & $SLEEP 1 ; $ECHO b > /proc/sysrq-trigger"
HEAD='/usr/bin/head'
...............................................................................................................
HP-UX) MACH_HARDWARE=`/bin/uname -m`
...............................................................................................................
LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$NMAPIDIR_64:/usr/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
# Presence of this file indicates that vendor clusterware is installed
SKGXNLIB=${NMAPIDIR_64}/libnmapi2.${SO_EXT}
if [ -f $SKGXNLIB ]; then
USING_VC=1
fi
...............................................................................................................
SunOS) MACH_HARDWARE=`/bin/uname -i`
ARCH=`/usr/bin/isainfo -b`
CLUSTERDIR=/opt/ORCLcluster
LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib:/usr/ucblib:$LD_LIBRARY_PATH
LD_LIBRARY_PATH_64=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib:/usr/ucblib:$LD_LIBRARY_PATH_64
if [ "${MACH_HARDWARE}${ARCH}" = "i86pc64" ]; then
LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib/amd64:/usr/ucblib/amd64:$LD_LIBRARY_PATH
LD_LIBRARY_PATH_64=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib/amd64:/usr/ucblib/amd64:$LD_LIBRARY_PATH_64
...............................................................................................................可以看到为不同操作系统设置对应环境变量。
...............................................................................................................
'stop')
$LOGMSG "Oracle CSSD being stopped"
# disable CSS startup until the next boot
$ID/init.cssd norun
# shutdown the OPROCD process if it is running
if [ ! -f $NOOPROCD ]; then
$OPROCD stop -t $OPROCD_STOP_TIMEOUT 2>$NULL
fi
# No steps are necessary for shutting down clsomon. It will go down
# automatically when CSS is shutdown.
# Shut down oclsvmon if it is up.
if [ ! -f $NOCLSVMON ]; then
$EVAL $FINDCLSVMON | $AWK '{ print $2 ; }' | $XARGS $KILLTERM > $NULL 2>&1
fi
# Invalidate init.cssd fatal pidfiles.
$ECHO "stopped" > $CSSFBOOT
$TOUCH $NOOPROCD
$TOUCH $NOCLSVMON
$TOUCH $NOCLSOMON
# Now tell it to shut down.
if [ -x "$CRSCTL" ]; then
$CRSCTL stop crs
fi
$ECHO "Shutdown has begun. The daemons should exit soon."
;;
'run')
# Foreground run, for single instance or single-node installs only.
# If this is used in a cluster install, RDBMS datafile corruption is
# likely.
# Run the startcheck to see whether we should continue
$ID/init.cssd startcheck
while [ "$?" != "0" ]; do
$SLEEP $RUNRECHECKTIME
$ID/init.cssd startcheck
done
cd $ORA_CRS_HOME/log/$HOST/cssd
# If there is an old corefile by such a collision prone name, then
# rename it to something safe.
if [ -f ./core ]; then
$MVF ./core "$UNIQUECORE"
fi
# Arguments. By default none.
OCSSD_ARGS=
$ORA_CRS_HOME/bin/ocssd $OCSSD_ARGS
;;
'fatal')
# This action is invoked to start the CSS daemon in cluster mode,
# and one or more of its accompanying daemons oprocd or clsvmon or clsomon
# This respawn wrapper is done in lieu of adding new entries to inittab.
# Check to see if we are supposed to run this boot.
$ID/init.cssd startcheck
while [ "$?" != "0" ]; do
$SLEEP $RUNRECHECKTIME
$ID/init.cssd startcheck
done
# See discussion in LocalFence
$EVAL $CLEANREBOOTLOCK
..........................................................................................................
$ECHO "See documentation at the top of $0 about supported commands."
exit 1;
;;
..........................................................................................................init.cssd根据输入的参数决定需要执行的操作,如果输入启动参数为fatal则正常启动cssd守护进程和其他相关守护进程。
(2)Init.crsd
ORA_CRS_HOME=/opt/oracle/product/CRS
ORACLE_HOME=$ORA_CRS_HOME
export ORA_CRS_HOME
export ORACLE_HOME
ORACLE_USER=oracle
UMASK=/bin/umask
SED=/bin/sed
CAT=/bin/cat
LOGMSG="/bin/logger -puser.err"
ECHO=/bin/echo
.............................................................定义crsd需要使用的环境变量和操作系统命令。
---------------------------------------------------------------------------------------------------------------------------
case $PLATFORM in
Linux)
SCRDIR=/etc/oracle/scls_scr/$HOST
ID=/etc/init.d
LOGGER="/usr/bin/logger"
if [ ! -f "$LOGGER" ]; then
LOGGER="/bin/logger"
fi
LOGMSG="$LOGGER -puser.err"
if [ ! -f "$UMASK" ]; then
UMASK=umask
......................................................................................................................................................
OSF1)
ID=/sbin/init.d
# No restriction in opening files on TRU64. Refer b7623099.
MAXFILE=unlimited
;;
*) /bin/echo "ERROR: Unknown Operating System"
exit -1
;;
esac
....................................................................................根据不同平台设置不同的环境变量。
......................................................................................................................................................
case $1 in
'home')
$ECHO $ORA_CRS_HOME
exit 0;
;;
'stop')
[ -r $PIDFILE ] && crspid=`$CAT $PIDFILE`
$LOGMSG "Oracle CRSD $crspid set to stop"
# Indicate that the next time we start up, it may be an initial startup.
$ECHO "stopped" > $CRSDBOOT
$LOGMSG "Oracle CRSD $crspid shutdown completed"
;;
'run') # foreground run out of init
.....................................................................................................................................................
$ECHO "Manual invocation of $0 is not supported."
;;
Esac
....................................................................根据输入参数值决定相应的操作。输入参数为run,则表示启动crsd.bin守护进程。
(3)Init.evmd
ORA_CRS_HOME=/opt/oracle/product/CRS
ORACLE_USER=oracle
ORACLE_HOME=$ORA_CRS_HOME
export ORACLE_HOME
export ORA_CRS_HOME
CAT=/bin/cat
RMF="/bin/rm -f"
LOGMSG="/bin/logger -puser.err"
ECHO=/bin/echo
KILL=/bin/kill
..............................................................................根据不同平台设置不同的环境变量。
case $PLATFORM in
Linux)
ID=/etc/init.d
LOGGER="/usr/bin/logger"
if [ ! -f "$LOGGER" ];then
LOGGER="/bin/logger"
fi
LOGMSG="$LOGGER -puser.err"
SU="/bin/su -l"
;;
HP-UX)
ID=/sbin/init.d
;;
.....................................................................................................................................................
;;
Esac
.......................................................................根据不同平台设置不同的环境变量。
....................................................................................................................................................
case $1 in
'home')
$ECHO $ORA_CRS_HOME
exit 0;
;;
'user')
$ECHO $ORACLE_USER
exit 0;
;;
'stop')
$LOGMSG "Oracle EVMD set to stop"
;;
'run') # foreground run out of init
根据输入参数值决定相应的操作。输入参数为run,则表示启动crsd.bin守护进程。
(4)小结
看了 init. cssd、init.crsd和 init. evmd三个脚本的内容后,可以发现这三个脚本的基本结构是:首先定义变量和操作系统命令,之后根据不同的操作系统平台设置对应的环境变量,最后根据输入的参数来决定对应的操作。但是这样做也为集群管理软件带来了问题:如果由于某种原因脚本的内容或者权限被修改,很可能导致集群无法被启动,并且很难进行诊断,而且所有的操作都保存在脚本中也会存在安全性的问题,所以,从11.2.0.2版本开始,集群的启动方式发生了改变。
目录 返回
首页