⭐一文汇总Etcd数据库几种常见故障及排查思路⭐
⭐一文汇总Etcd数据库几种常见故障及排查思路⭐
文章目录
- ⭐一文汇总Etcd数据库几种常见故障及排查思路⭐
-
- 1.etcd重建节点后无法加入集群
- 2.etcd集群初始化集群设置失败
- 3.etcd报错URL address does not have the form
- 4.etcd新节点加入集群报错
- 5.etcd集群新增的节点IP不存在于证书文件导致无法加入集群
1.etcd重建节点后无法加入集群
现象: 在集群中的一台etcd节点,由于某种原因踢出了集群,现在需要重新加入集群
报错内容如下
8月 27 16:40:17 binary-k8s-node1 etcd[30462]: {"level":"fatal","ts":"2021-08-27T16:40:17.603+0800","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"open /data/etcd/ssl/server.pem: no such file or directory","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}
这是由于当前etcd节点已经加入过某个etcd集群导致的,再次尝试加入新的集群就会报错,解决问题的方法就是将该节点在原有集群里面踢出去或者将该节点的ETCD_INITIAL_CLUSTER_STATE参数设置成"existing"即可解决
2.etcd集群初始化集群设置失败
报错内容如下
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.960+0800","caller":"embed/etcd.go:117","msg":"configuring peer listeners","listen-peer-urls":["http://192.168.20.11:2380"]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.960+0800","caller":"embed/etcd.go:465","msg":"starting with peer TLS","tls-info":"cert = /data/etcd/ssl/server.pem, key = /data/etcd/ssl/server-key.pem, trusted-ca = /data/etcd/ssl/ca.pem, client-cert-auth = false, crl-file = ","cipher-suites":[]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"warn","ts":"2021-09-10T11:01:06.960+0800","caller":"embed/etcd.go:502","msg":"scheme is HTTP while key and cert files are present; ignoring key and cert files","peer-url":"http://192.168.20.11:2380"}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.960+0800","caller":"embed/etcd.go:127","msg":"configuring client listeners","listen-client-urls":["http://127.0.0.1:2379","http://192.168.20.11:2379"]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"warn","ts":"2021-09-10T11:01:06.961+0800","caller":"embed/etcd.go:614","msg":"scheme is HTTP while key and cert files are present; ignoring key and cert files","client-url":"http://127.0.0.1:2379"}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"warn","ts":"2021-09-10T11:01:06.961+0800","caller":"embed/etcd.go:614","msg":"scheme is HTTP while key and cert files are present; ignoring key and cert files","client-url":"http://192.168.20.11:2379"}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.961+0800","caller":"embed/etcd.go:360","msg":"closing etcd server","name":"etcd-4","data-dir":"/data/etcd/data","advertise-peer-urls":["http://192.168.20.11:2380"],"advertise-client-urls":["http://127.0.0.1:2379","http://192.168.20.11:2379"]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.961+0800","caller":"embed/etcd.go:364","msg":"closed etcd server","name":"etcd-4","data-dir":"/data/etcd/data","advertise-peer-urls":["http://192.168.20.11:2380"],"advertise-client-urls":["http://127.0.0.1:2379","http://192.168.20.11:2379"]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"warn","ts":"2021-09-10T11:01:06.961+0800","caller":"etcdmain/etcd.go:176","msg":"failed to start etcd","error":"error setting up initial cluster: URL address does not have the form \"host:port\": http://ip:192.168.20.11:2380"}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"fatal","ts":"2021-09-10T11:01:06.961+0800","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"error setting up initial cluster: URL address does not have the form \"host:port\": http://ip:192.168.20.11:2380","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}
在报错中看到error setting up initial cluster这个关键信息,就说明肯定是由于配置文件写的不对导致的,仔细检查配置文件语法就能找到问题所在
3.etcd报错URL address does not have the form
报错内容如下
9月 10 11:06:04 binary-k8s-master2 etcd[10971]: {"level":"warn","ts":"2021-09-10T11:06:04.981+0800","caller":"etcdmain/etcd.go:176","msg":"failed to start etcd","error":"error setting up initial cluster: URL address does not have the form \"host:port\": https://ip:192.168.20.11:2380"}
9月 10 11:06:04 binary-k8s-master2 etcd[10971]: {"level":"fatal","ts":"2021-09-10T11:06:04.981+0800","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"error setting up initial cluster: URL address does not have the form \"host:port\": https://ip:192.168.20.11:2380","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}
仔细查看日志,根据提示说url找不到,在看后面的具体内容,发现还是配置文件写的不太对吧,https://后面居然跟了个ip单词,问题找到了
解决方法,将配置文件https://后面的ip单词去掉去掉
果然有问题,去掉即可
服务启动成功
4.etcd新节点加入集群报错
9月 10 13:12:42 binary-k8s-master1 etcd[8832]: {"level":"warn","ts":"2021-09-10T13:12:42.386+0800","caller":"rafthttp/stream.go:682","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"aae107adddd0d3d8","remote-peer-cluster-id":"2d72d2986bd93bc7","local-member-id":"51ae3f86f3783687","local-member-cluster-id":"20b119eb5f91aa4b","error":"cluster ID mismatch"}
9月 10 13:12:42 binary-k8s-master1 etcd[8832]: {"level":"warn","ts":"2021-09-10T13:12:42.386+0800","caller":"rafthttp/stream.go:682","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"aae107adddd0d3d8","remote-peer-cluster-id":"2d72d2986bd93bc7","local-member-id":"51ae3f86f3783687","local-member-cluster-id":"20b119eb5f91aa4b","error":"cluster ID mismatch"}
9月 10 13:12:42 binary-k8s-master1 etcd[8832]: request sent was ignored (cluster ID mismatch: remote[aae107adddd0d3d8]=2d72d2986bd93bc7, local=20b119eb5f91aa4b)
此报错是由于新节点原来是单机部署的单节点etcd,加入集群后没有删除数据目录导致的,删除数据目录即可解决
rm -rf /data/etcd/data/*
5.etcd集群新增的节点IP不存在于证书文件导致无法加入集群
报错内容如下
9月 14 18:45:40 binary-k8s-master1 etcd[14881]: {"level":"warn","ts":"2021-09-14T18:45:40.932+0800","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"c8a24e337417915f","rtt":"0s","error":"x509: certificate is valid for 192.168.20.10, 192.168.20.11, 192.168.20.12, 192.168.20.13, not 192.168.20.8"}
由于新节点的ip不在etcd证书文件里,所以导致的错误
解决方法:在证书配置文件中新增节点ip,然后重新生成证书,将证书拷贝至所有节点,重启所有etcd节点即可
目录 返回
首页