虚拟化容器,大数据,DBA,中间件,监控。

Ceph故障处理(一)-health_warn:clock skew detected on mon

21 12月
作者:admin|分类:应用管理

造成集群状态health_warn:clock skew detected on mon节点的原因有两个,一个是mon节点上ntp服务器未启动,另一个是ceph设置的mon的时间偏差阈值比较小。

排查时也应遵循先第一个原因,后第二个原因的方式。

1、确认ntp服务是否正常工作

$ systemctl status ntpd

如果没有安装ntpd,可以参照以下文章进行安装,传送门:Centos7 搭建NTP服务器及客户端同步时间

2、修改ceph配置中的时间偏差阈值

在deploy节点修改配置文件调整时间偏差阈值,命令如下:

$ vim /ceph-install/ceph.conf

在global字段下添加:

mon clock drift allowed = 2
mon clock drift warn backoff = 30

向需要同步的mon节点推送配置文件,命令如下:

[root@cephnode01 my-cluster]# ceph-deploy --overwrite-conf config push cephnode{01,02,03}
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy --overwrite-conf config push cephnode01 cephnode02 cephnode03
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : push
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x14296c8>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  client                        : ['cephnode01', 'cephnode02', 'cephnode03']
[ceph_deploy.cli][INFO  ]  func                          : <function config at 0x13bbaa0>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.config][DEBUG ] Pushing config to cephnode01
[cephnode01][DEBUG ] connected to host: cephnode01 
[cephnode01][DEBUG ] detect platform information from remote host
[cephnode01][DEBUG ] detect machine type
[cephnode01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to cephnode02
[cephnode02][DEBUG ] connected to host: cephnode02 
[cephnode02][DEBUG ] detect platform information from remote host
[cephnode02][DEBUG ] detect machine type
[cephnode02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to cephnode03
[cephnode03][DEBUG ] connected to host: cephnode03 
[cephnode03][DEBUG ] detect platform information from remote host
[cephnode03][DEBUG ] detect machine type
[cephnode03][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

这里是向node1、node2、node3节点推送,重启mon服务,命令如下:

$ systemctl restart ceph-mon.target

3、验证

[root@cephnode01 my-cluster]# ceph -s
  cluster:
    id:     406e0c23-755f-4378-bbc9-13548c4d3d64
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03 (age 8m)
    mgr: cephnode01(active, since 71m), standbys: cephnode03, cephnode02
    mds:  3 up:standby
    osd: 3 osds: 3 up (since 11m), 3 in (since 47m)
    rgw: 1 daemon active (cephnode01)
 
  task status:
 
  data:
    pools:   4 pools, 128 pgs
    objects: 187 objects, 1.2 KiB
    usage:   3.0 GiB used, 12 GiB / 15 GiB avail
    pgs:     128 active+clean

显示health_ok说明问题解决。

浏览603 评论0
返回
目录
返回
首页
Shell命令替换:将命令的输出结果赋值给变量 Kubernetes ServiceAccount 解决 pod 在集群里面的身份认证问题