k8s对接外部ceph集群
- 通过rook部署和对接ceph,使用k8s提供ceph服务。rook官方文档非常详细,里面也有常见问题的fix版本,文档如下:
https://rook.io/docs/rook/v1.3/ceph-quickstart.html
https://rook.io/docs/rook/v1.3/ceph-toolbox.html
https://rook.io/docs/rook/v1.3/ceph-cluster-crd.html#storage-selection-settings
https://rook.io/docs/rook/v1.3/ceph-block.html - k8s对接外部的ceph服务
静态持久卷
每次需要使用存储空间,需要存储管理员先手动在存储上创建好对应的image,然后k8s才能使用。
创建ceph secret
需要给k8s添加一个访问ceph的secret,主要用于k8s来给rbd做map。
1,在ceph master节点执行如下命令获取admin的经过base64编码的key(生产环境可以创建一个给k8s使用的专门用户):
# ceph auth get-key client.admin | base64
QVFBL2dJZGhPMkorRWhBQUZvMFd4T2xIMWxscElQRHVDcGl2UkE9PQ==
2,在k8s通过manifest创建secret
# vim ceph-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
data:
key: QVFBL2dJZGhPMkorRWhBQUZvMFd4T2xIMWxscElQRHVDcGl2UkE9PQ==
# kubectl apply -f ceph-secret.yaml
创建image
默认情况下,ceph创建之后使用的默认pool为rdb。使用如下命令在安装ceph的客户端或者直接在ceph master节点上创建image:
# rbd create image1 -s 1024
# rbd info rbd/image1
rbd image 'image1':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.374d6b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
创建持久卷
在k8s上通过manifest创建:
[root@k82 ceph]# cat pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: ceph-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
- ReadOnlyMany
rbd:
monitors:
- 192.168.207.4:6789
- 192.168.207.5:6789
pool: rbd
image: image1
user: admin
secretRef:
name: ceph-secret
fsType: ext4
persistentVolumeReclaimPolicy: Retain
[root@k82 ceph]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
ceph-pv 1Gi RWO,ROX Retain Bound default/ceph-claim 4d1h
pvc-467a1316-4716-46bd-8e28-f73e635eb95d 2Gi RWO Delete Bound default/ceph-zll ceph-rbd 32m
主要指令使用说明如下:
1,accessModes:
RWO:ReadWriteOnce,仅允许单个节点挂载进行读写;
ROX:ReadOnlyMany,允许多个节点挂载且只读;
RWX:ReadWriteMany,允许多个节点挂载进行读写;
2,fsType
如果PersistentVolumes的VolumeMode为Filesystem,那么此字段指定挂载卷时应该使用的文件系统。如果卷尚未格式化,并且支持格式化,此值将用于格式化卷。
3,persistentVolumeReclaimPolicy:
回收策略有三种:
Delete:对于动态配置的PersistentVolumes来说,默认回收策略为 “Delete”。这表示当用户删除对应的 PersistentVolumeClaim 时,动态配置的volume将被自动删除。
Retain:如果volume包含重要数据时,适合使用“Retain”策略。使用 “Retain” 时,如果用户删除 PersistentVolumeClaim,对应的 PersistentVolume 不会被删除。相反,它将变为 Released 状态,表示所有的数据可以被手动恢复。
Recycle: 如果用户删除 PersistentVolumeClaim,则删除卷上的数据,卷不会删除。
创建持久卷声明
在k8s上通过manifest创建:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-claim
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 1Gi
当创建好claim之后,k8s会匹配最合适的pv将其绑定到claim,持久卷的容量需要满足claim的要求+卷的模式必须包含claim中指定的访问模式。故如上的pvc会绑定到我们刚创建的pv上。
查看pvc的绑定:
[root@k82 ceph]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-claim Bound ceph-pv 1Gi RWO,ROX 4d1h
ceph-zll Bound pvc-467a1316-4716-46bd-8e28-f73e635eb95d 2Gi RWO ceph-rbd 36m
[root@k82 ceph]#
pod使用持久卷
在k8s上通过manifest创建:
apiVersion: v1
kind: Pod
metadata:
name: ceph-pod
spec:
containers:
- name: ceph-ubuntu
image: docker.io/nginx
ports:
- containerPort: 80
volumeMounts:
- name: ceph-mnt
mountPath: /mnt
readOnly: false
volumes:
- name: ceph-mnt
persistentVolumeClaim:
claimName: ceph-claim
进入ubuntu系统查看挂载项,发现image已经挂载和格式化好:
[root@k82 ceph]# kubectl exec ceph-pod -it sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
#
#
#
#
# df -Th
Filesystem Type Size Used Avail Use% Mounted on
overlay overlay 17G 8.7G 8.4G 51% /
tmpfs tmpfs 64M 0 64M 0% /dev
tmpfs tmpfs 985M 0 985M 0% /sys/fs/cgroup
/dev/rbd1 ext4 976M 2.6M 958M 1% /mnt
/dev/mapper/centos-root xfs 17G 8.7G 8.4G 51% /etc/hosts
shm tmpfs 64M 0 64M 0% /dev/shm
tmpfs tmpfs 1.9G 12K 1.9G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs tmpfs 985M 0 985M 0% /proc/acpi
tmpfs tmpfs 985M 0 985M 0% /proc/scsi
tmpfs tmpfs 985M 0 985M 0% /sys/firmware
#
附件报错问题分析
# kubectl get pods
NAME READY STATUS RESTARTS AGE
ceph-pod 0/1 ContainerCreating 0 75s
# kubectl describe pods ceph-pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 48m (x6 over 75m) kubelet, work3 Unable to attach or mount volumes: unmounted volumes=[ceph-mnt], unattached volumes=[default-token-tlsjd ceph-mnt]: timed out waiting for the condition
Warning FailedMount 8m59s (x45 over 84m) kubelet, work3 MountVolume.WaitForAttach failed for volume "ceph-pv" : fail to check rbd image status with: (executable file not found in $PATH), rbd output: ()
Warning FailedMount 3m13s (x23 over 82m) kubelet, work3 Unable to attach or mount volumes: unmounted volumes=[ceph-mnt], unattached volumes=[ceph-mnt default-token-tlsjd]: timed out waiting for the condition
出现这个问题是因为k8s依赖kubelet来实现attach (rbd map) and detach (rbd unmap) RBD image的操作,而kubelet跑在每台k8s的节点上。故每台k8s节点都要安装ceph-common包来给kubelet提供rbd命令,使用阿里云的ceph repo给每台机器安装之后,又发现新的报错:
# kubectl describe pods ceph-pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
MountVolume.WaitForAttach failed for volume "ceph-pv" : rbd: map failed exit status 6, rbd output: 2020-06-02 17:12:18.575338 7f0171c3ed80 -1 did not load config file, using default settings.
2020-06-02 17:12:18.603861 7f0171c3ed80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
rbd: sysfs write failed
2020-06-02 17:12:18.620447 7f0171c3ed80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (6) No such device or address
Warning FailedMount 15s kubelet, work3 MountVolume.WaitForAttach failed for volume "ceph-pv" : rbd: map failed exit status 6, rbd output: 2020-06-02 17:12:19.257006 7fc330e14d80 -1 did not load config file, using default settings.
只能继续查资料找原因,发现有2个问题需要解决:
1),发现是由于k8s集群和ceph集群 kernel版本不一样,k8s集群的kernel版本较低,rdb块存储的一些feature 低版本kernel不支持,需要disable。通过如下命令disable:
# rbd info rbd/image1
rbd image 'image1':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.374d6b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
# rbd feature disable rbd/image1 exclusive-lock object-map fast-diff deep-flatten
2),找不到key的报错是由于k8s节点要和ceph交互以把image映射到本机,需要每台k8s节点的/etc/ceph目录都要放置ceph.client.admin.keyring文件,映射的时候做认证使用。故给每个节点创建了/etc/ceph目录,写脚本放置了一下key文件。
# scp /etc/ceph/ceph.client.admin.keyring root@k8s-node:/etc/ceph
动态持久卷
不需要存储管理员干预,使k8s使用的存储image创建自动化,即根据使用需要可以动态申请存储空间并自动创建。需要先定义一个或者多个StorageClass,每个StorageClass都必须配置一个provisioner,用来决定使用哪个卷插件分配PV。然后,StorageClass资源指定持久卷声明请求StorageClass时使用哪个provisioner来在对应存储创建持久卷。
k8s官方提供了支持的卷插件: https://kubernetes.io/zh/docs/concepts/storage/storage-classes/
1、创建pool,动态pv专用的数据池
2、创建ceph-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret-admin
namespace: kube-system
type: "kubernetes.io/rbd"
data:
key: QVFBL2dJZGhPMkorRWhBQUZvMFd4T2xIMWxscElQRHVDcGl2UkE9PQ==
---
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
namespace: kube-system
type: "kubernetes.io/rbd"
data:
key: QVFBZUZJMWh6THIrSVJBQTVwa3FqZUFKM0RYWGp6VWg2R3gwVEE9PQ==
这里ceph-secret-admin的key和ceph-secret的key值可以一样,上面的内容时参考https://github.com/kubernetes-retired/external-storage/tree/master/ceph/rbd
如若使用admin用户可具有ceph操作的所有权限
若使用创建新用户可以重新定义ceph的操作权限,如下
ceph osd pool create kube 8 8
ceph auth add client.kube mon 'allow r' osd 'allow rwx pool=kube'
ceph auth get-key client.kube > /tmp/key
kubectl create secret generic ceph-secret --from-file=/tmp/key --namespace=kube-system --type=kubernetes.io/rbd
//查看
//kubectl get secret -n kube-system |grep ceph ceph-secret kubernetes.io/rbd 1 3d11h ceph-secret-admin kubernetes.io/rbd 1 3d11h
3、部署rbd-provisioner
这里需要注意,因为k8s上的kube-controller-manager资源是运行在容器里,它要调用物理机上的ceph操作需要另外在容器上部署一个rbd-provisioner才能操作成功,否则会报错如下:
"rbd: create volume failed, err: failed to create rbd image: executable file not found in $PATH:"
[root@k82 333]# cat rbd-provisioner.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
- apiGroups: [""]
resources: ["services"]
resourceNames: ["kube-dns", "coredns"]
verbs: ["list", "get"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
roleRef:
kind: ClusterRole
name: rbd-provisioner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rbd-provisioner
namespace: kube-system
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rbd-provisioner
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: rbd-provisioner
namespace: kube-system
labels:
app: rbd-provisioner
spec:
replicas: 1
selector:
matchLabels:
app: rbd-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: rbd-provisioner
spec:
nodeSelector:
app: rbd-provisioner
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:latest"
volumeMounts:
- name: ceph-conf
mountPath: /etc/ceph
env:
- name: PROVISIONER_NAME
value: ceph.com/rbd
serviceAccount: rbd-provisioner
volumes:
- name: ceph-conf
hostPath:
path: /etc/ceph
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rbd-provisioner
namespace: kube-system
[root@k82 333]#
4、创建ceph-rbd-sc.yaml
[root@k82 333]# cat ceph-rbd-sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: ceph.com/rbd
#reclaimPolicy: Retain
allowVolumeExpansion: true
parameters:
monitors: 192.168.207.5:6789,192.168.207.5:6789
adminId: admin
adminSecretName: ceph-secret-admin
adminSecretNamespace: kube-system
pool: kube
userId: kube
userSecretName: ceph-secret
userSecretNamespace: kube-system
fsType: ext4
imageFormat: "2"
imageFeatures: "layering"
主要指令使用说明如下:
1,storageclass.beta.kubernetes.io/is-default-class
如果设置为true,则为默认的storageclasss。pvc申请存储,如果没有指定storageclass,则从默认的storageclass申请。
2,adminId:ceph客户端ID,用于在ceph 池中创建映像。默认是 “admin”。
3,userId:ceph客户端ID,用于映射rbd镜像。默认与adminId相同。
4,imageFormat:ceph rbd镜像格式,“1” 或者 “2”。默认值是 “1”。
5,imageFeatures:这个参数是可选的,只能在你将imageFormat设置为 “2” 才使用。 目前支持的功能只是layering。默认是 “",没有功能打开。
5、创建pvc和测试应用
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-zll
spec:
storageClassName: ceph-rbd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
[root@k82 333]# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: ceph-zll
spec:
containers:
- name: ceph-busybox
image: busybox
command: ["sleep", "60000"]
volumeMounts:
- name: ceph-vol1
mountPath: /usr/share/basybox
readOnly: false
volumes:
- name: ceph-vol1
persistentVolumeClaim:
claimName: ceph-zll
/报错
1 controller.go:1004] provision "default/ceph-claim" class "ceph-rbd": unexpected error getting claim reference: selfLink was empty, can't make reference
————————————————
//找了找资料发现,kubernetes 1.20版本 禁用了 selfLink。
当前的解决方法是编辑/etc/kubernetes/manifests/kube-apiserver.yaml
在这里:
spec:
containers:
- command:
- kube-apiserver
添加这一行:
- --feature-gates=RemoveSelfLink=false
需要k8s的每个master节点都进行此操作。
//更改完后,pvc继续处于pending状态,继续查看logs信息
//报错
I0728 14:26:55.704256 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ceph-claim", UID:"e252cc3d-4ff0-400f-9bc2-feee20ecbb40", APIVersion:"v1", ResourceVersion:"19495043", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ceph-rbd": failed to create rbd image: exit status 13, command output: did not load config file, using default settings.
2021-07-28 14:26:52.645 7f70da266900 -1 Errors while parsing config file!
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open /root/.ceph/ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 Errors while parsing config file!
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open /root/.ceph/ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.685 7f70da266900 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-07-28 14:26:55.689 7f70da266900 -1 monclient: get_monmap_and_config failed to get config
2021-07-28 14:26:55.689 7f70da266900 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
rbd: couldn't connect to the cluster!
————————————————
这个报错是说rbd-provisioner需要ceph.conf等配置信息,在网上找到临时解决办法是通过docker拷贝将本地/etc/ceph/里的文件拷贝到镜像里去。对了这里忘了说明在执行rbd-provisioner.yaml成功后docker本地镜像会成功拉下一个quay.io/external_storage/rbd-provisioner:latest镜像,如下
//临时拷贝命令
//sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
52218eacb4a9 quay.io/external_storage/rbd-provisioner "/usr/local/bin/rbd-…" 2 days ago Up 2 days k8s_rbd-provisioner_rbd-provisioner-f4956975f-4ksqt_kube-system_c6e08e90-3775-45f2-90fe-9fbc0eb16efc_0
//sudo docker cp /etc/ceph/ceph.conf 52218eacb4a9:/etc/ceph
这种方法一旦docker 镜像重启,拷贝的文件就没有了,所以我在rbd-provisioner.yaml文件了加载了hostpath将本地目录挂载到容器里,如下
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:latest"
volumeMounts:
- name: ceph-conf
mountPath: /etc/ceph
env:
- name: PROVISIONER_NAME
value: ceph.com/rbd
serviceAccount: rbd-provisioner
volumes:
- name: ceph-conf
hostPath:
path: /etc/ceph
//为保证rbd-provisioner和kube-controller-manager运行在同一个节点上,在master节点上打上标签
//kubectl label nodes k8s70131 app=rbd-provisioner
//kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s70131 Ready control-plane,master 137d v1.21.2 app=rbd-provisioner,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s70132,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
//可以看到labels栏里已经有了app=rbd-provisioner标签
//删除标签操作
//kubectl label nodes k8s70131 app-
因为master节点都设置了污点,要想在其节点上部署pod需要设置容忍污点。在rbd-provisioner.yaml文件中添加如下
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
[root@k82 333]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
ceph-pv 1Gi RWO,ROX Retain Bound default/ceph-claim 5d
pvc-467a1316-4716-46bd-8e28-f73e635eb95d 2Gi RWO Delete Bound default/ceph-zll ceph-rbd 23h
[root@k82 333]#
[root@k82 333]#
[root@k82 333]#
[root@k82 333]#
[root@k82 333]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-claim Bound ceph-pv 1Gi RWO,ROX 5d
ceph-zll Bound pvc-467a1316-4716-46bd-8e28-f73e635eb95d 2Gi RWO ceph-rbd 23h
[root@k82 333]#
目录 返回
首页