k8s对接外部ceph集群

16 11月

通过rook部署和对接ceph，使用k8s提供ceph服务。rook官方文档非常详细，里面也有常见问题的fix版本，文档如下：
https://rook.io/docs/rook/v1.3/ceph-quickstart.html
https://rook.io/docs/rook/v1.3/ceph-toolbox.html
https://rook.io/docs/rook/v1.3/ceph-cluster-crd.html#storage-selection-settings
https://rook.io/docs/rook/v1.3/ceph-block.html
k8s对接外部的ceph服务

静态持久卷

每次需要使用存储空间，需要存储管理员先手动在存储上创建好对应的image，然后k8s才能使用。

创建ceph secret

需要给k8s添加一个访问ceph的secret，主要用于k8s来给rbd做map。
1，在ceph master节点执行如下命令获取admin的经过base64编码的key（生产环境可以创建一个给k8s使用的专门用户）：

# ceph auth get-key client.admin | base64
QVFBL2dJZGhPMkorRWhBQUZvMFd4T2xIMWxscElQRHVDcGl2UkE9PQ==

2，在k8s通过manifest创建secret

# vim ceph-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
data:
  key: QVFBL2dJZGhPMkorRWhBQUZvMFd4T2xIMWxscElQRHVDcGl2UkE9PQ==
  
# kubectl apply -f ceph-secret.yaml

创建image

默认情况下，ceph创建之后使用的默认pool为rdb。使用如下命令在安装ceph的客户端或者直接在ceph master节点上创建image：

# rbd create image1 -s 1024
# rbd info rbd/image1
rbd image 'image1':
	size 1024 MB in 256 objects
	order 22 (4096 kB objects)
	block_name_prefix: rbd_data.374d6b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:

创建持久卷

在k8s上通过manifest创建：

[root@k82 ceph]# cat pv.yaml 

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ceph-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
    - ReadOnlyMany
  rbd:
    monitors:
      - 192.168.207.4:6789
      - 192.168.207.5:6789
    pool: rbd
    image: image1
    user: admin
    secretRef:
      name: ceph-secret
    fsType: ext4
  persistentVolumeReclaimPolicy: Retain



[root@k82 ceph]# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                STORAGECLASS   REASON   AGE
ceph-pv                                    1Gi        RWO,ROX        Retain           Bound    default/ceph-claim                           4d1h
pvc-467a1316-4716-46bd-8e28-f73e635eb95d   2Gi        RWO            Delete           Bound    default/ceph-zll     ceph-rbd                32m

主要指令使用说明如下：
1，accessModes：

RWO：ReadWriteOnce，仅允许单个节点挂载进行读写；
ROX：ReadOnlyMany，允许多个节点挂载且只读；
RWX：ReadWriteMany，允许多个节点挂载进行读写；

2，fsType

如果PersistentVolumes的VolumeMode为Filesystem，那么此字段指定挂载卷时应该使用的文件系统。如果卷尚未格式化，并且支持格式化，此值将用于格式化卷。

3，persistentVolumeReclaimPolicy：

回收策略有三种:
Delete：对于动态配置的PersistentVolumes来说，默认回收策略为 “Delete”。这表示当用户删除对应的 PersistentVolumeClaim 时，动态配置的volume将被自动删除。
 
Retain：如果volume包含重要数据时，适合使用“Retain”策略。使用 “Retain” 时，如果用户删除 PersistentVolumeClaim，对应的 PersistentVolume 不会被删除。相反，它将变为 Released 状态，表示所有的数据可以被手动恢复。
 
Recycle: 如果用户删除 PersistentVolumeClaim，则删除卷上的数据，卷不会删除。

创建持久卷声明

在k8s上通过manifest创建：

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ceph-claim
spec:
  accessModes:
    - ReadWriteOnce
    - ReadOnlyMany
  resources:
    requests:
      storage: 1Gi

当创建好claim之后，k8s会匹配最合适的pv将其绑定到claim，持久卷的容量需要满足claim的要求+卷的模式必须包含claim中指定的访问模式。故如上的pvc会绑定到我们刚创建的pv上。

查看pvc的绑定：

[root@k82 ceph]# kubectl get pvc
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ceph-claim   Bound    ceph-pv                                    1Gi        RWO,ROX                       4d1h
ceph-zll     Bound    pvc-467a1316-4716-46bd-8e28-f73e635eb95d   2Gi        RWO            ceph-rbd       36m
[root@k82 ceph]#

pod使用持久卷

在k8s上通过manifest创建：

apiVersion: v1
kind: Pod
metadata:
  name: ceph-pod
spec:
  containers:
  - name: ceph-ubuntu
    image: docker.io/nginx
    ports:
      - containerPort: 80
    volumeMounts:
    - name: ceph-mnt
      mountPath: /mnt
      readOnly: false
  volumes:
  - name: ceph-mnt
    persistentVolumeClaim:
      claimName: ceph-claim

进入ubuntu系统查看挂载项，发现image已经挂载和格式化好：



[root@k82 ceph]# kubectl exec ceph-pod -it sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
# 
# 
# 
# 
# df -Th
Filesystem              Type     Size  Used Avail Use% Mounted on
overlay                 overlay   17G  8.7G  8.4G  51% /
tmpfs                   tmpfs     64M     0   64M   0% /dev
tmpfs                   tmpfs    985M     0  985M   0% /sys/fs/cgroup
/dev/rbd1               ext4     976M  2.6M  958M   1% /mnt
/dev/mapper/centos-root xfs       17G  8.7G  8.4G  51% /etc/hosts
shm                     tmpfs     64M     0   64M   0% /dev/shm
tmpfs                   tmpfs    1.9G   12K  1.9G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                   tmpfs    985M     0  985M   0% /proc/acpi
tmpfs                   tmpfs    985M     0  985M   0% /proc/scsi
tmpfs                   tmpfs    985M     0  985M   0% /sys/firmware
#

附件报错问题分析

# kubectl get pods
NAME                     READY   STATUS              RESTARTS   AGE
ceph-pod                 0/1     ContainerCreating   0          75s
 
# kubectl describe pods ceph-pod
Events:
  Type     Reason       Age                   From            Message
  ----     ------       ----                  ----            -------
  Warning  FailedMount  48m (x6 over 75m)     kubelet, work3  Unable to attach or mount volumes: unmounted volumes=[ceph-mnt], unattached volumes=[default-token-tlsjd ceph-mnt]: timed out waiting for the condition
  Warning  FailedMount  8m59s (x45 over 84m)  kubelet, work3  MountVolume.WaitForAttach failed for volume "ceph-pv" : fail to check rbd image status with: (executable file not found in $PATH), rbd output: ()
  Warning  FailedMount  3m13s (x23 over 82m)  kubelet, work3  Unable to attach or mount volumes: unmounted volumes=[ceph-mnt], unattached volumes=[ceph-mnt default-token-tlsjd]: timed out waiting for the condition



出现这个问题是因为k8s依赖kubelet来实现attach (rbd map) and detach (rbd unmap) RBD image的操作，而kubelet跑在每台k8s的节点上。故每台k8s节点都要安装ceph-common包来给kubelet提供rbd命令，使用阿里云的ceph repo给每台机器安装之后，又发现新的报错：





# kubectl describe pods ceph-pod
Events:
  Type     Reason       Age                   From            Message
  ----     ------       ----                  ----            -------
MountVolume.WaitForAttach failed for volume "ceph-pv" : rbd: map failed exit status 6, rbd output: 2020-06-02 17:12:18.575338 7f0171c3ed80 -1 did not load config file, using default settings.
2020-06-02 17:12:18.603861 7f0171c3ed80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
rbd: sysfs write failed
2020-06-02 17:12:18.620447 7f0171c3ed80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (6) No such device or address
  Warning  FailedMount  15s  kubelet, work3  MountVolume.WaitForAttach failed for volume "ceph-pv" : rbd: map failed exit status 6, rbd output: 2020-06-02 17:12:19.257006 7fc330e14d80 -1 did not load config file, using default settings.
 



只能继续查资料找原因，发现有2个问题需要解决：
1），发现是由于k8s集群和ceph集群 kernel版本不一样，k8s集群的kernel版本较低，rdb块存储的一些feature 低版本kernel不支持，需要disable。通过如下命令disable：


# rbd info rbd/image1
rbd image 'image1':
	size 1024 MB in 256 objects
	order 22 (4096 kB objects)
	block_name_prefix: rbd_data.374d6b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	
# rbd  feature disable rbd/image1 exclusive-lock object-map fast-diff deep-flatten	



2），找不到key的报错是由于k8s节点要和ceph交互以把image映射到本机，需要每台k8s节点的/etc/ceph目录都要放置ceph.client.admin.keyring文件，映射的时候做认证使用。故给每个节点创建了/etc/ceph目录，写脚本放置了一下key文件。


# scp /etc/ceph/ceph.client.admin.keyring root@k8s-node:/etc/ceph

动态持久卷

不需要存储管理员干预，使k8s使用的存储image创建自动化，即根据使用需要可以动态申请存储空间并自动创建。需要先定义一个或者多个StorageClass，每个StorageClass都必须配置一个provisioner，用来决定使用哪个卷插件分配PV。然后，StorageClass资源指定持久卷声明请求StorageClass时使用哪个provisioner来在对应存储创建持久卷。

k8s官方提供了支持的卷插件： https://kubernetes.io/zh/docs/concepts/storage/storage-classes/

1、创建pool，动态pv专用的数据池

2、创建ceph-secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret-admin
  namespace: kube-system
type: "kubernetes.io/rbd"
data:
  key: QVFBL2dJZGhPMkorRWhBQUZvMFd4T2xIMWxscElQRHVDcGl2UkE9PQ==
 
---
 
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
  namespace: kube-system
type: "kubernetes.io/rbd"
data:
  key: QVFBZUZJMWh6THIrSVJBQTVwa3FqZUFKM0RYWGp6VWg2R3gwVEE9PQ==

这里ceph-secret-admin的key和ceph-secret的key值可以一样，上面的内容时参考https://github.com/kubernetes-retired/external-storage/tree/master/ceph/rbd

如若使用admin用户可具有ceph操作的所有权限

若使用创建新用户可以重新定义ceph的操作权限，如下

ceph osd pool create kube 8 8
ceph auth add client.kube mon 'allow r' osd 'allow rwx pool=kube'
ceph auth get-key client.kube > /tmp/key
kubectl create secret generic ceph-secret --from-file=/tmp/key --namespace=kube-system --type=kubernetes.io/rbd

//查看

//kubectl get secret -n kube-system |grep ceph
 
ceph-secret                   kubernetes.io/rbd                     1      3d11h
ceph-secret-admin             kubernetes.io/rbd                     1      3d11h

3、部署rbd-provisioner

这里需要注意，因为k8s上的kube-controller-manager资源是运行在容器里，它要调用物理机上的ceph操作需要另外在容器上部署一个rbd-provisioner才能操作成功，否则会报错如下：

"rbd: create volume failed, err: failed to create rbd image: executable file not found in $PATH:"

[root@k82 333]# cat rbd-provisioner.yaml 
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rbd-provisioner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
  - apiGroups: [""]
    resources: ["services"]
    resourceNames: ["kube-dns", "coredns"]
    verbs: ["list", "get"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
 
---
 
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rbd-provisioner
subjects:
  - kind: ServiceAccount
    name: rbd-provisioner
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: rbd-provisioner
  apiGroup: rbac.authorization.k8s.io
 
---
 
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: rbd-provisioner
  namespace: kube-system
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
 
---
 
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: rbd-provisioner
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: rbd-provisioner
subjects:
- kind: ServiceAccount
  name: rbd-provisioner
  namespace: kube-system
 
---
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rbd-provisioner
  namespace: kube-system
  labels:
    app: rbd-provisioner
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rbd-provisioner
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: rbd-provisioner
    spec:
      nodeSelector:
        app: rbd-provisioner
      containers:
      - name: rbd-provisioner
        image: "quay.io/external_storage/rbd-provisioner:latest"
        volumeMounts:
        - name: ceph-conf
          mountPath: /etc/ceph
        env:
        - name: PROVISIONER_NAME
          value: ceph.com/rbd
      serviceAccount: rbd-provisioner
      volumes:
      - name: ceph-conf
        hostPath:
          path: /etc/ceph
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
 
---
 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rbd-provisioner
  namespace: kube-system
[root@k82 333]#

4、创建ceph-rbd-sc.yaml

[root@k82 333]# cat ceph-rbd-sc.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-rbd
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: ceph.com/rbd
#reclaimPolicy: Retain
allowVolumeExpansion: true
parameters:
  monitors: 192.168.207.5:6789,192.168.207.5:6789
  adminId: admin
  adminSecretName: ceph-secret-admin
  adminSecretNamespace: kube-system
  pool: kube
  userId: kube
  userSecretName: ceph-secret
  userSecretNamespace: kube-system
  fsType: ext4
  imageFormat: "2"
  imageFeatures: "layering"

主要指令使用说明如下：
1，storageclass.beta.kubernetes.io/is-default-class
如果设置为true，则为默认的storageclasss。pvc申请存储，如果没有指定storageclass，则从默认的storageclass申请。
2，adminId：ceph客户端ID，用于在ceph 池中创建映像。默认是 “admin”。
3，userId：ceph客户端ID，用于映射rbd镜像。默认与adminId相同。
4，imageFormat：ceph rbd镜像格式，“1” 或者 “2”。默认值是 “1”。
5，imageFeatures：这个参数是可选的，只能在你将imageFormat设置为 “2” 才使用。目前支持的功能只是layering。默认是 “"，没有功能打开。

5、创建pvc和测试应用

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ceph-zll
spec:
  storageClassName: ceph-rbd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi






[root@k82 333]# cat pod.yaml 

apiVersion: v1
kind: Pod
metadata:
  name: ceph-zll
spec:
  containers:
  - name: ceph-busybox
    image: busybox
    command: ["sleep", "60000"]
    volumeMounts:
    - name: ceph-vol1
      mountPath: /usr/share/basybox
      readOnly: false
  volumes:
  - name: ceph-vol1
    persistentVolumeClaim:
      claimName: ceph-zll

/报错
1 controller.go:1004] provision "default/ceph-claim" class "ceph-rbd": unexpected error getting claim reference: selfLink was empty, can't make reference
————————————————


//找了找资料发现，kubernetes 1.20版本 禁用了 selfLink。
当前的解决方法是编辑/etc/kubernetes/manifests/kube-apiserver.yaml
在这里：

spec:
  containers:
  - command:
    - kube-apiserver
添加这一行：

- --feature-gates=RemoveSelfLink=false

需要k8s的每个master节点都进行此操作。

//更改完后，pvc继续处于pending状态，继续查看logs信息





//报错
I0728 14:26:55.704256       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ceph-claim", UID:"e252cc3d-4ff0-400f-9bc2-feee20ecbb40", APIVersion:"v1", ResourceVersion:"19495043", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ceph-rbd": failed to create rbd image: exit status 13, command output: did not load config file, using default settings.
2021-07-28 14:26:52.645 7f70da266900 -1 Errors while parsing config file!
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open /root/.ceph/ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 Errors while parsing config file!
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open /root/.ceph/ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.645 7f70da266900 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2021-07-28 14:26:52.685 7f70da266900 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-07-28 14:26:55.689 7f70da266900 -1 monclient: get_monmap_and_config failed to get config
2021-07-28 14:26:55.689 7f70da266900 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
rbd: couldn't connect to the cluster!
————————————————



这个报错是说rbd-provisioner需要ceph.conf等配置信息，在网上找到临时解决办法是通过docker拷贝将本地/etc/ceph/里的文件拷贝到镜像里去。对了这里忘了说明在执行rbd-provisioner.yaml成功后docker本地镜像会成功拉下一个quay.io/external_storage/rbd-provisioner:latest镜像，如下

//临时拷贝命令

//sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
52218eacb4a9 quay.io/external_storage/rbd-provisioner "/usr/local/bin/rbd-…" 2 days ago Up 2 days k8s_rbd-provisioner_rbd-provisioner-f4956975f-4ksqt_kube-system_c6e08e90-3775-45f2-90fe-9fbc0eb16efc_0

//sudo docker cp /etc/ceph/ceph.conf 52218eacb4a9:/etc/ceph

这种方法一旦docker 镜像重启，拷贝的文件就没有了，所以我在rbd-provisioner.yaml文件了加载了hostpath将本地目录挂载到容器里，如下

    containers:
      - name: rbd-provisioner
        image: "quay.io/external_storage/rbd-provisioner:latest"
        volumeMounts:
        - name: ceph-conf
          mountPath: /etc/ceph
        env:
        - name: PROVISIONER_NAME
          value: ceph.com/rbd
      serviceAccount: rbd-provisioner
      volumes:
      - name: ceph-conf
        hostPath:
          path: /etc/ceph

//为保证rbd-provisioner和kube-controller-manager运行在同一个节点上，在master节点上打上标签

//kubectl label nodes k8s70131 app=rbd-provisioner
 
//kubectl get nodes --show-labels
NAME       STATUS   ROLES                  AGE    VERSION   LABELS
k8s70131   Ready    control-plane,master   137d   v1.21.2   app=rbd-provisioner,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s70132,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
 
//可以看到labels栏里已经有了app=rbd-provisioner标签
 
 
//删除标签操作
//kubectl label nodes k8s70131 app-

因为master节点都设置了污点，要想在其节点上部署pod需要设置容忍污点。在rbd-provisioner.yaml文件中添加如下

 tolerations:
   - key: "node-role.kubernetes.io/master"
     operator: "Exists"
     effect: "NoSchedule"

[root@k82 333]# kubectl get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                STORAGECLASS   REASON   AGE
ceph-pv                                    1Gi        RWO,ROX        Retain           Bound    default/ceph-claim                           5d
pvc-467a1316-4716-46bd-8e28-f73e635eb95d   2Gi        RWO            Delete           Bound    default/ceph-zll     ceph-rbd                23h
[root@k82 333]# 
[root@k82 333]# 
[root@k82 333]# 
[root@k82 333]# 
[root@k82 333]# kubectl get pvc
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ceph-claim   Bound    ceph-pv                                    1Gi        RWO,ROX                       5d
ceph-zll     Bound    pvc-467a1316-4716-46bd-8e28-f73e635eb95d   2Gi        RWO            ceph-rbd       23h
[root@k82 333]#

k8s对接外部ceph集群

静态持久卷

创建ceph secret

创建image

创建持久卷

创建持久卷声明

pod使用持久卷

动态持久卷

更多文章推荐

历史上的今天

« 2025年7月 »
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31