【Error: ImagePullBackOff】Kubernetes中Nginx服务启动失败排查流程
迪丽瓦拉
2024-06-01 10:05:54
0

❌pod节点启动失败,nginx服务无法正常访问,服务状态显示为ImagePullBackOff

[root@m1 ~]# kubectl get pods
NAME                    READY   STATUS             RESTARTS   AGE
nginx-f89759699-cgjgp   0/1     ImagePullBackOff   0          103m

💥查看nginx服务的Pod节点详细信息。

[root@m1 ~]# kubectl describe pod nginx-f89759699-cgjgp
Name:             nginx-f89759699-cgjgp
Namespace:        default
Priority:         0
Service Account:  default
Node:             n1/192.168.200.84
Start Time:       Fri, 10 Mar 2023 08:40:33 +0800
Labels:           app=nginxpod-template-hash=f89759699
Annotations:      
Status:           Pending
IP:               10.244.3.20
IPs:IP:           10.244.3.20
Controlled By:  ReplicaSet/nginx-f89759699
Containers:nginx:Container ID:   Image:          nginxImage ID:       Port:           Host Port:      State:          WaitingReason:       ImagePullBackOffReady:          FalseRestart Count:  0Environment:    Mounts:/var/run/secrets/kubernetes.io/serviceaccount from default-token-zk8sj (ro)
Conditions:Type              StatusInitialized       True Ready             False ContainersReady   False PodScheduled      True 
Volumes:default-token-zk8sj:Type:        Secret (a volume populated by a Secret)SecretName:  default-token-zk8sjOptional:    false
QoS Class:       BestEffort
Node-Selectors:  
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:Type     Reason   Age                     From     Message----     ------   ----                    ----     -------Normal   BackOff  57m (x179 over 100m)    kubelet  Back-off pulling image "nginx"Normal   Pulling  7m33s (x22 over 100m)   kubelet  Pulling image "nginx"Warning  Failed   2m30s (x417 over 100m)  kubelet  Error: ImagePullBackOff

发现,获取nginx镜像失败。可能是由于Docker服务引起的。

于是,检查Docker是否正常启动

systemctl status docker

发现,docker服务启动失败💢,手动尝试重新启动。

systemctl restart docker

但是,重启docker服务失败,出现如下报错信息。

[root@m1 ~]# systemctl restart docker
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.

执行systemctl restart docker命令失效。

接着,当执行docker version命令时,发现未能连接到Docker daemon

[root@m1 ~]# docker version
Client: Docker Engine - CommunityVersion:           20.10.17API version:       1.41Go version:        go1.17.11Git commit:        100c701Built:             Mon Jun  6 23:03:11 2022OS/Arch:           linux/amd64Context:           defaultExperimental:      true
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

于是,再次通过执行systemctl status docker命令,查看docker服务未能启动,阅读输出报错信息,如下所示。

[root@m1 ~]# systemctl status docker
● docker.service - Docker Application Container EngineLoaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)Active: failed (Result: exit-code) since Fri 2023-03-10 10:28:16 CST; 4min 35s agoDocs: https://docs.docker.comMain PID: 2221 (code=exited, status=1/FAILURE)Mar 10 10:28:13 m1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Mar 10 10:28:13 m1 systemd[1]: docker.service: Failed with result 'exit-code'.
Mar 10 10:28:13 m1 systemd[1]: Failed to start Docker Application Container Engine.
Mar 10 10:28:16 m1 systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart.
Mar 10 10:28:16 m1 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Mar 10 10:28:16 m1 systemd[1]: Stopped Docker Application Container Engine.
Mar 10 10:28:16 m1 systemd[1]: docker.service: Start request repeated too quickly.
Mar 10 10:28:16 m1 systemd[1]: docker.service: Failed with result 'exit-code'.
Mar 10 10:28:16 m1 systemd[1]: Failed to start Docker Application Container Engine.
[root@m1 ~]#

通过上述输出显示,Docker 服务进程的启动失败,状态为 1/FAILURE

✅接下来,尝试通过以下步骤来排查和解决问题:

1️⃣查看 Docker 服务日志:使用以下命令查看 Docker 服务日志,以便更详细地了解失败原因。

sudo journalctl -u docker.service

image-20230310105025930
2️⃣ 通过输出Ddocker日志分析,提取到了相关报错信息片段,发现是配置daemon中的/etc/docker/daemon.json配置文件出错导致的。

Mar 10 10:20:17 m1 systemd[1]: Starting Docker Application Container Engine...
Mar 10 10:20:17 m1 dockerd[1572]: unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '"' after object key:value pair
Mar 10 10:20:17 m1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Mar 10 10:20:17 m1 systemd[1]: docker.service: Failed with result 'exit-code'.
Mar 10 10:20:17 m1 systemd[1]: Failed to start Docker Application Container Engine.
Mar 10 10:20:19 m1 systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart.
Mar 10 10:20:19 m1 systemd[1]: docker.service: Scheduled restart job, restart counter is at 2.
Mar 10 10:20:19 m1 systemd[1]: Stopped Docker Application Container Engine.

3️⃣此时,查看daemon配置文件/etc/docker/daemon.json是否配置正确。

[root@m1 ~]# cat /etc/docker/daemon.json
{	# 设置 Docker 镜像的注册表镜像源为阿里云镜像源。"registry-mirrors": ["https://w2kavmmf.mirror.aliyuncs.com"]# 指定 Docker 守护进程使用 systemd 作为 cgroup driver。"exec-opts": ["native.cgroupdriver=systemd"]
}

咋一看,配置信息没有什么问题,都是正确的,但仔细一看,就会发现应该在"registry-mirrors"选项的结尾添加逗号。犯了缺少逗号(,)导致的语法错误,终于找到了问题根源。

🟢修改后:

[root@m1 ~]# cat /etc/docker/daemon.json
{"registry-mirrors": ["https://w2kavmmf.mirror.aliyuncs.com"],"exec-opts": ["native.cgroupdriver=systemd"]
}

按下:wq报错退出。

4️⃣ 重新加载系统并重新启动Docker服务

systemctl daemon-reload
systemctl restart docker
systemctl status docker

5️⃣检查docker版本信息是否输出正常

[root@m1 ~]# docket version
-bash: docket: command not found
[root@m1 ~]# docker version
Client: Docker Engine - CommunityVersion:           20.10.17API version:       1.41Go version:        go1.17.11Git commit:        100c701Built:             Mon Jun  6 23:03:11 2022OS/Arch:           linux/amd64Context:           defaultExperimental:      trueServer: Docker Engine - CommunityEngine:Version:          20.10.17API version:      1.41 (minimum version 1.12)Go version:       go1.17.11Git commit:       a89b842Built:            Mon Jun  6 23:01:29 2022OS/Arch:          linux/amd64Experimental:     falsecontainerd:Version:          1.6.6GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1runc:Version:          1.1.2GitCommit:        v1.1.2-0-ga916309docker-init:Version:          0.19.0GitCommit:        de40ad0
[root@m1 ~]# docker info
Client:Context:    defaultDebug Mode: falsePlugins:app: Docker App (Docker Inc., v0.9.1-beta3)buildx: Docker Buildx (Docker Inc., v0.8.2-docker)scan: Docker Scan (Docker Inc., v0.17.0)Server:Containers: 20Running: 8Paused: 0Stopped: 12Images: 20Server Version: 20.10.17Storage Driver: overlay2Backing Filesystem: xfsSupports d_type: trueNative Overlay Diff: trueuserxattr: falseLogging Driver: json-fileCgroup Driver: systemdCgroup Version: 1Plugins:Volume: localNetwork: bridge host ipvlan macvlan null overlayLog: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslogSwarm: inactiveRuntimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runcDefault Runtime: runcInit Binary: docker-initcontainerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1runc version: v1.1.2-0-ga916309init version: de40ad0Security Options:seccompProfile: defaultKernel Version: 4.18.0-372.9.1.el8.x86_64Operating System: Rocky Linux 8.6 (Green Obsidian)OSType: linuxArchitecture: x86_64CPUs: 2Total Memory: 9.711GiBName: m1ID: 4YIS:FHSB:YXRI:CED5:PJSJ:EAS2:BCR3:GJJF:FDPK:EDJH:DVKU:AIYJDocker Root Dir: /var/lib/dockerDebug Mode: falseRegistry: https://index.docker.io/v1/Labels:Experimental: falseInsecure Registries:127.0.0.0/8Registry Mirrors:https://w2kavmmf.mirror.aliyuncs.com/Live Restore Enabled: false

至此,Docker服务重启成功,pod节点恢复正常,Nginx服务能够正常访问。

[root@m1 ~]# kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
nginx-f89759699-cgjgp   1/1     Running   0          174m

查看pod详细信息,显示正常。

[root@m1 ~]# kubectl describe pod nginx-f89759699-cgjgp
Name:             nginx-f89759699-cgjgp
Namespace:        default
Priority:         0
Service Account:  default
Node:             n1/192.168.200.84
Start Time:       Fri, 10 Mar 2023 08:40:33 +0800
Labels:           app=nginxpod-template-hash=f89759699
Annotations:      
Status:           Running
IP:               10.244.3.20
IPs:IP:           10.244.3.20
Controlled By:  ReplicaSet/nginx-f89759699
Containers:nginx:Container ID:   docker://88bdc2bfa592f60bf99bac2125b0adae005118ae8f2f271225245f20b7cfb3c8Image:          nginxImage ID:       docker-pullable://nginx@sha256:aa0afebbb3cfa473099a62c4b32e9b3fb73ed23f2a75a65ce1d4b4f55a5c2ef2Port:           Host Port:      State:          RunningStarted:      Fri, 10 Mar 2023 10:37:42 +0800Ready:          TrueRestart Count:  0Environment:    Mounts:/var/run/secrets/kubernetes.io/serviceaccount from default-token-zk8sj (ro)
Conditions:Type              StatusInitialized       True Ready             True ContainersReady   True PodScheduled      True 
Volumes:default-token-zk8sj:Type:        Secret (a volume populated by a Secret)SecretName:  default-token-zk8sjOptional:    false
QoS Class:       BestEffort
Node-Selectors:  
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:Type    Reason   Age                   From     Message----    ------   ----                  ----     -------Normal  BackOff  58m (x480 over 171m)  kubelet  Back-off pulling image "nginx"
[root@m1 ~]# 

image-20230310113934162

相关内容