2024年CKS考试准备

Start : 2024.1.15

DDL1:2024.2.3 15:00 (Rescheduled)

DDL2:2024.2.8 20:00 (Failed)

DDL3: 2024.2.23 23:30 (Success)

学习环境搭建

Install Calico CNI

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Other prerequisites
# swap,br_netfilter....

# Configure containerd
$ containerd config default | sed 's|SystemdCgroup = false|SystemdCgroup = true|g' | sudo tee /etc/containerd/config.toml > /dev/null
$ sudo systemctl restart containerd && systemctl status containerd

# Hosts
$ echo "127.0.0.1 kube.multipass.local" | sudo tee -a /etc/hosts > /dev/null

# Initialize Kubernetes cluster
$ kubeadm init --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint kube.multipass.local

# untaint master node
$ kubectl get node --no-headers | grep control-plane | awk '{cmd="kubectl taint node "$1" node-role.kubernetes.io/control-plane-";system(cmd)}'

# Install Calico CNI which supports Network Policy
$ kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/tigera-operator.yaml
$ curl https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/custom-resources.yaml | sed 's|192.168|10.244|g' | kubectl apply -f -

There may be lots of impediments to setting up this kubernetes cluster successfully due to network conditions or some misconfigurations, but those above can be solved step by step. Finally, node(s) is(are) ready as follows:

1
2
3
$ kubectl get node
NAME STATUS ROLES AGE VERSION
kube-master Ready control-plane 21m v1.28.3

做题工具

alias

1
2
3
alias k=kubectl                         # will already be pre-configured
export do="--dry-run=client -o yaml" # k create deploy nginx --image=nginx $do
export now="--force --grace-period 0" # k delete pod x $now

vim

1
2
3
set tabstop=2
set expandtab
set shiftwidth=2

jsonpath

https://kubernetes.io/docs/reference/kubectl/jsonpath/

Function Description Example Result
text the plain text kind is {.kind} kind is List
@ the current object {@} the same as input
. or [] child operator {.kind}, {['kind']} or {['name\.type']} List
.. recursive descent {..name} 127.0.0.1 127.0.0.2 myself e2e
* wildcard. Get all objects {.items[*].metadata.name} [127.0.0.1 127.0.0.2]
[start:end:step] subscript operator {.users[0].name} myself
[,] union operator {.items[*]['metadata.name', 'status.capacity']} 127.0.0.1 127.0.0.2 map[cpu:4] map[cpu:8]
?() filter {.users[?(@.name=="e2e")].user.password} secret
range, end iterate list {range .items[*]}[{.metadata.name}, {.status.capacity}] {end} [127.0.0.1, map[cpu:4]] [127.0.0.2, map[cpu:8]]
'' quote interpreted string {range .items[*]}{.metadata.name}{'\t'}{end} 127.0.0.1 127.0.0.2
\ escape termination character {.items[0].metadata.labels.kubernetes\.io/hostname} 127.0.0.1

Examples using kubectl and JSONPath expressions:

1
2
3
4
5
6
7
kubectl get pods -o json
kubectl get pods -o=jsonpath='{@}'
kubectl get pods -o=jsonpath='{.items[0]}'
kubectl get pods -o=jsonpath='{.items[0].metadata.name}'
kubectl get pods -o=jsonpath="{.items[*]['metadata.name', 'status.capacity']}"
kubectl get pods -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.startTime}{"\n"}{end}'
kubectl get pods -o=jsonpath='{.items[0].metadata.labels.kubernetes\.io/hostname}'

yq

examples

1
2
3
4
5
6
7
8
9
10
11
# Read a value
yq '.a.b[0].c' file.yaml

# Pipe from STDIN
yq '.a.b[0].c' < file.yaml

# Update a yaml file, in place
yq -i '.a.b[0].c = "cool"' file.yaml

# Find and update an item in an array
yq '(.[] | select(.name == "foo") | .address) = "12 cat st"'
  • jq
  • tr
  • truncate
  • crictl
  • cut

awk

常规使用

组装命令并执行

1
kubectl get svc | awk '{cmd="kubectl get svc "$1" -oyaml";system(cmd)}'
  • sed
  • sha512sum
  • podman(to build image)

日志查看

https://kubernetes.io/docs/concepts/cluster-administration/logging/#system-component-logs

  • 对 kubelet 组件:journalctl -xefu kubelet

  • 对以容器形式启动的 kubernetes 组件:在/var/log/pods下(当把kube-apiserver的yaml弄坏起不来之后,应该可以在这个目录下查看启动失败的原因)

group缩写问题

group为空时表示core group,此时的 gv 缩写只有 v,即

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ kubectl api-resources --api-group=''
NAME SHORTNAMES APIVERSION NAMESPACED KIND
bindings v1 true Binding
componentstatuses cs v1 false ComponentStatus
configmaps cm v1 true ConfigMap
endpoints ep v1 true Endpoints
events ev v1 true Event
limitranges limits v1 true LimitRange
namespaces ns v1 false Namespace
nodes no v1 false Node
persistentvolumeclaims pvc v1 true PersistentVolumeClaim
persistentvolumes pv v1 false PersistentVolume
pods po v1 true Pod
podtemplates v1 true PodTemplate
replicationcontrollers rc v1 true ReplicationController
resourcequotas quota v1 true ResourceQuota
secrets v1 true Secret
serviceaccounts sa v1 true ServiceAccount
services svc v1 true Service

常见的控制器资源基本属于apps group

1
2
3
4
5
6
NAME                  SHORTNAMES   APIVERSION   NAMESPACED   KIND
controllerrevisions apps/v1 true ControllerRevision
daemonsets ds apps/v1 true DaemonSet
deployments deploy apps/v1 true Deployment
replicasets rs apps/v1 true ReplicaSet
statefulsets sts apps/v1 true StatefulSet

常见的几种需要填充group的地方

  • rbac

    role.rules.apiGroups 只需要填写group

  • audit policy

    rules.resources.group 只需要填写group

做题方法论

客观局限

  1. 网络卡顿,导致做题时及其不流畅;
  2. 题量大,总共有16道题,需要在120分钟内完成,完成一道题的平局时间应该120/16=7分钟;

主观局限

  1. 对安全相关的操作不熟练;
  2. 无做题策略,选择按顺序,从头做到尾;
  3. 开始做题前,未对该题进行自我评估,不确定能否短时间内搞定,做了一半,发现搞不定,非常浪费时间;

改进措施

  1. 改用香港/澳门移动网络漫游来做题(如果这次还是考不过,有网络卡顿的原因,下次得肉身跑到香港去了23333);
  2. 及格分数需要67,粗略估计取得证书,需要做完67/(100/16)=11道题,可以允许5道题不做,但每题的平均用时为10分钟多一点。
  3. 做题步骤
    1. 1分钟浏览全题,理解题意,并评估是否有把握能完成;
    2. 没把握的用flag标记,跳过,下一题;
  4. 优先去做的题目类型
    1. audit policy
    2. apparmor
    3. pod security standard
    4. runtime class
    5. image policy webhook
    6. trivy & kube-bench
    7. rbac & opa
    8. secret
    9. security context

主题

RBAC

Reference: https://kubernetes.io/docs/reference/access-authn-authz/rbac/

创建sa、role、rolebinding

1
2
3
kubectl create sa shs-sa
kubectl create role shs-role --resource=pods,secrets --verb=*
kubectl create rolebinding shs-rolebinding --role=shs-role --serviceaccount=default:shs-sa

使用该ServiceAccount

1
kubectl patch -p '{"spec":{"template":{"spec":{"serviceAccountName":"shs-saax","serviceAccount":"shs-saax"}}}}' deployment shs-dep

Tips:

  1. 如果sa异常(如:不存在),则deployment的pod不会建出来,因为rs控制器已经检测到了异常,所以未建pod。
  2. deploy.spec.template.spec.serviceAccountdeploy.spec.template.spec.serviceAccountName 都需要修改。

NetworkPolicy

Pod Isolation

  • Egress, outbound connection from pod, non-isolated by default. If NetworkPolicy selects this pod and was Egress type, then only out connections mentioned in it allowed. If lots of NetworkPolicy select the same pod, then all connections mentoined in those list are allowed. Additive.
  • Ingress, inbound connection to pod, non-isolated by default. Effects are as the same as Egress. Only connections mentioned by NetworkPolicy can connect to this Pod successfully.
    Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: default
spec:
# Indicates which pods this NetworkPolicy will apply to, selecting by pod's label
# podSelector: {} indicates this NetworkPolicy apply to all pods in default ns.
podSelector:
matchLabels:
role: db
policyTypes:
- Ingress
- Egress
# Defines which pod can connect to this pod.
ingress:
# both `from` and `port` rules are satitisfied, then allowed
- from:
# 1. IP CIDR, connections from pod whose IP in this CIDR are allowd to connect
- ipBlock:
cidr: 172.17.0.0/16
except:
- 172.17.1.0/24
# 2. Namespace, connection from pod whose namespace has following labels are allowed to connect
- namespaceSelector:
matchLabels:
project: myproject
# 3. Pod, connections from pod which has following labels are allowed to connect
- podSelector:
matchLabels:
role: frontend
# Based on `from`, if the target port of those connection was 6379 and protocl was TCP, allowed.
ports:
- protocol: TCP
port: 6379
# Defines which pod can be connected by this pod
# both `to` and `port` rules are satitisfied, then allowed
egress:
- to:
# 1. Connections from this pod can connect to this CIDR
- ipBlock:
cidr: 10.0.0.0/24
# Based on `to`, if the target port and protocol of this connection was 5978 and TCP, allowed.
ports:
- protocol: TCP
port: 5978

parameters of to and from was the same, as follows(irrelevant informations are omitted):

1
2
3
4
5
6
$ kubectl explain networkpolicy.spec.ingress
from <[]NetworkPolicyPeer>
ports <[]NetworkPolicyPort>
$ kubectl explain networkpolicy.spec.egress
to <[]NetworkPolicyPeer>
ports <[]NetworkPolicyPort>

details of NetworkPolicyPeer are as follows:

1
2
3
4
$ kubectl explain networkpolicy.spec.egress.to
ipBlock <IPBlock>
namespaceSelector <LabelSelector>
podSelector <LabelSelector>

As for details of IPBlock and LabelSelector, just kubectl explain before coding yamls.

Tips

  • NetworkPolicy was namespaced, and only works in the namespace to which it belongs.
  • NetworkPolicy can define only allowed rules.
  • NetworkPolicy selects pod by labels only.

Default network policy

Deny all in & out bound traffics for a pod

1
2
3
4
5
6
7
8
9
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

The OPA(Open Policy Agent) Gatekeeper

Ref: https://kubernetes.io/blog/2019/08/06/opa-gatekeeper-policy-and-governance-for-kubernetes

gatekeeper admission controller 拦截所有资源的创建、更新、删除请求,并针对相关资源,做所配置的校验。

定义校验模板

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
listKind: K8sRequiredLabelsList
plural: k8srequiredlabels
singular: k8srequiredlabels
validation:
# Schema for the `parameters` field
openAPIV3Schema:
properties:
labels:
type: array
items: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels

deny[{"msg": msg, "details": {"missing_labels": missing}}] {
provided := {label | input.review.object.metadata.labels[label]}
required := {label | label := input.parameters.labels[_]}
missing := required - provided
count(missing) > 0
msg := sprintf("you must provide labels: %v", [missing])
}

创建具体约束

每个命名空间都需要一个标签hr

1
2
3
4
5
6
7
8
9
10
11
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: ns-must-have-hr
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Namespace"]
parameters:
labels: ["hr"]

审计

Gatekeeper stores audit results as violations listed in the status field of the relevant Constraint.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: ns-must-have-hr
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Namespace"]
parameters:
labels: ["hr"]
status:
#...
violations:
- enforcementAction: deny
kind: Namespace
message: 'you must provide labels: {"hr"}'
name: default
- enforcementAction: deny
kind: Namespace
message: 'you must provide labels: {"hr"}'
name: gatekeeper-system
- enforcementAction: deny
kind: Namespace
message: 'you must provide labels: {"hr"}'
name: kube-public
- enforcementAction: deny
kind: Namespace
message: 'you must provide labels: {"hr"}'
name: kube-system

Apparmor

https://kubernetes.io/docs/tutorials/security/apparmor/

Confine programs or containers to limited set of resources, such as Linux capabilities, network access, file permissions, etc.

Works in 2 Modes

  • enforcing, blocks access
  • complain, only reports invalid access

Prerequisites

  • works on kubernetes v1.4 +
  • AppArmor kernel moduls enabled
  • Container Runtime supports AppArmor
  • Profile is loaded by kernel

Usage

Add annotations to pod which needed to be secured with key, name of container in Pod should be referred in key:

1
2
container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>
container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>

The profile_ref can be one of:

  • runtime/default to apply the runtime’s default profile
  • localhost/<profile_name> to apply the profile loaded on the host with the name <profile_name>
  • unconfined to indicate that no profiles will be loaded

Works

  • View Pod Events
  • kubectl exec <pod_name> -- cat /proc/1/attr/current

Helpful commands

  • Show AppArmor Status
1
$ apparmor_status
  • Load Profile to kernel
1
2
3
4
5
6
7
8
9
10
11
12
13
$ apparmor_parser /etc/apparmor.d/nginx_apparmor
$ sudo apparmor_parser -q <<EOF
#include <tunables/global>

profile k8s-apparmor-example-deny-write flags=(attach_disconnected) {
#include <abstractions/base>

file,

# Deny all file writes.
deny /** w,
}
EOF

Audit Policy

Reference: https://kubernetes.io/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Policy

Getting Started

Stage

  • RequestReceived - Before handled by handler chain
  • ResponseStarted - After response header sent, but before response body sent
  • ResponseComplete - After response body sent
  • Panic - After panic occurred.

Audit Level

  • None - don’t log events that match this rule
  • Metadata - log request metadata only(user, timestamp,resource,vert) but not request or response body.
  • Request - log event metadata plus request body
  • RequestResponse - log event metadata plus request, response bodies.

Example

1
2
3
4
5
6
7
8
9
10
11
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
- ResponseStarted
- ResponseComplete
- Panic
rules:
- level: Metadata
resources:
- group: ""
resources: ["pods"]

Configure it to kube-apiserver, see audit log.

Tips

If the Policy doesn’t work as expected, check kube-apiserver logs as below, make sure the Policy was loaded successfully. Since it seems to load a default AuditPolicy when failled to load the AuditPolicy passed in parameters of kube-apiserver. Logs are as below:

1
W0122 16:00:29.139016       1 reader.go:81] Audit policy contains errors, falling back to lenient decoding: strict decoding error: unknown field "rules[0].resources[0].resource"

Pod Security Standard

Reference

Policies

The Pod Security Standards define three different policies to broadly cover the security spectrum. These policies are cumulative and range from highly-permissive to highly-restrictive. This guide outlines the requirements of each policy.

3种策略,每种策略只是定义了检查、校验哪些字段、即校验范围。此3种策略,从上至下,校验范围依次增大。具体校验内容,可参考文档。

Profile Description
Privileged Unrestricted policy, providing the widest possible level of permissions. This policy allows for known privilege escalations.
Baseline Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
Restricted Heavily restricted policy, following current Pod hardening best practices.

Levels

有3种针对不符合上述3种Policy的处理方式,即强制要求(否则拒绝创建)记录到审计日志中用户可见警告

Mode Description
enforce Policy violations will cause the pod to be rejected.
audit Policy violations will trigger the addition of an audit annotation to the event recorded in the audit log, but are otherwise allowed.
warn Policy violations will trigger a user-facing warning, but are otherwise allowed.

Usage

在命名空间上打标签

1
2
3
4
5
6
7
8
9
10
11
12
# The per-mode level label indicates which policy level to apply for the mode.
#
# MODE must be one of `enforce`, `audit`, or `warn`.
# LEVEL must be one of `privileged`, `baseline`, or `restricted`.
pod-security.kubernetes.io/<MODE>: <LEVEL>

# Optional: per-mode version label that can be used to pin the policy to the
# version that shipped with a given Kubernetes minor version (for example v1.29).
#
# MODE must be one of `enforce`, `audit`, or `warn`.
# VERSION must be a valid Kubernetes minor version, or `latest`.
pod-security.kubernetes.io/<MODE>-version: <VERSION>

SecurityContext

Reference: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

总共有两个安全配置的地方,位置分别为

  • pod.spec.securityContext 属于 PodSecurityContext 这个结构体,表示pod中所有的容器都使用这个配置;
  • pod.spec["initContainers","containers"].securityContext 属于 SecurityContext 这个结构体,只限于当前容器使用此配置,且优先级高于上面的配置。

上述两种不同位置的安全配置中,有的字段是重复的,SecurityContext 的优先级更高。两者之间值的差异(都存在的字段已加粗):

PodSecurityContext SecurityContext
fsGroup allowPrivilegeEscalation
fsGroupChangePolicy capabilities
runAsGroup privileged
runAsNonRoot procMount
runAsUser readOnlyRootFilesystem
seLinuxOptions runAsGroup
seccompProfile runAsNonRoot
supplementalGroups runAsUser
sysctls seLinuxOptions
windowsOptions seccompProfile
windowsOptions

来源:https://www.mrdadong.com/archives/cks-securitycontext

按照如下要求修改 sec-ns 命名空间里的 Deployment secdep

一、用 ID 为 30000 的用户启动容器(设置用户 ID 为: 30000 runAsUser

二、不允许进程获得超出其父进程的特权(禁止 allowPrivilegeEscalation

三、以只读方式加载容器的根文件系统(对根文件的只读权限readOnlyRootFilesystem

注意点:

  1. readOnlyRootFilesystemallowPrivilegeEscalation 只存在于SecurityContext中,因此需要为各个容器都配置上,需注意容器数量,避免漏配;
  2. runAsUser 存在于PodSecurityContextSecurityContext中,可只配 PodSecurityContext

RuntimeClass

Reference: https://kubernetes.io/docs/concepts/containers/runtime-class/

  • Create RuntimeClass
  • Specify created RuntimeClass in pod.spec.runtimeClassName

Secret

Reference: https://kubernetes.io/docs/concepts/configuration/secret

  • Secret Type
  • Mount to a pod

练手速【来源】

  1. 在 namespace istio-system 中获取名为 db1-test 的现有 secret 的内容。将 username 字段存储在名为 /cks/sec/user.txt 的文件中,并将 password 字段存储在名为 /cks/sec/pass.txt 的文件中。

    注意:你必须创建以上两个文件,他们还不存在。

    注意:不要在以下步骤中使用/修改先前创建的文件,如果需要,可以创建新的临时文件。

  2. istio-system namespace 中创建一个名为 db2-test 的新 secret,内容如下:

  • username : production-instance

  • password : KvLftKgs4aVH

  1. 最后,创建一个新的 Pod,它可以通过卷访问 secret db2-test
  • Pod 名称 secret-pod

  • Namespace istio-system

  • 容器名 dev-container

  • 镜像 nginx

  • 卷名 secret-volume

  • 挂载路径 /etc/secret

ServiceAccount

Reference: https://kubernetes.io/docs/concepts/security/service-accounts/

  • Prevent kubernetes from injecting credentials for a pod
1
2
$ kubectl explain sa.automountServiceAccountToken
$ kubectl explain pod.spec.automountServiceAccountToken

Set one of fields above to false to prevent auto injection for a pod.

  • Restrict access to Secrets
    Set annotation kubernetes.io/enforce-mountable-secrets to true for a ServiceAccount, then only secrets in the field of sa.secrets of this ServiceAccount was allowed to use in a pod, such as a secret volume, envFrom, imagePullSecrets.

  • How to use ServiceAccount to connect to apiserver? reference

    1
    2
    3
    curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt --header "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" -X GET https://kubernetes.default.svc/api/v1/namespaces/default/secrets
    # or
    curl -k -XGET --header "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default.svc/api/v1/namespaces/default/secrets

    整个kubeAPIServer提供了三类API Resource接口:

    • core group:主要在 /api/v1 下;
    • named groups:其 path 为 /apis/$GROUP/$VERSION
    • 系统状态的一些 API:如/metrics/version 等;

    而API的URL大致以 /apis/{group}/{version}/namespaces/{namespace}/{resources}/{name} 组成,结构如下图所示:

    https://img2020.cnblogs.com/other/2041406/202101/2041406-20210120094608734-1433747602.png

    Tips:

    在apiserver的URL中,资源需要用复数形式,如:

    1
    2
    curl -k -H "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceaccount/token)" \
    https://kubernetes.default.svc/api/v1/namespaces/default/pods/shs-dep-b56c568d6-n8h6d

etcd

How to use etcdctl to get raw data from etcd?

1
2
3
4
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
get /registry/secrets/default -w=json | jq .

Upgrade kubernetes version

Follow these steps:

for master

  1. k drain controller-plane
  2. apt-mark unhold kubeadm
  3. apt-mark hold kubelet kubectl
  4. apt update && apt upgrade -y
  5. kubeadm upgrade plan
  6. kubeadm upgrade apply v1.2x.x
  7. kubeadm upgrade plan(for check purpose)
  8. apt-mark hold kubeadm
  9. apt-mark unhold kubelet kubectl
  10. apt install kubectl=1.2x.x kubelet=1.2x.x
  11. apt-mark hold kubelet kubectl
  12. systemctl restart kubelet
  13. systemctl status kubelet
  14. k uncordon controller-plane

for node

  1. k drain node
  2. apt update
  3. apt-mark unhold kubeadm
  4. apt-mark hold kubectl kubelet
  5. apt install kubeadm=1.2x.x
  6. kubeadm upgrade plan
  7. kubeadm upgrade node
  8. apt-mark hold kubeadm
  9. apt-mark unhold kubectl kubelet
  10. apt install kubectl=1.2x.x kubelet=1.2x.x
  11. systemctl restart kubelet
  12. systemctl status kubelet
  13. k uncordon kubelet

check upgrade result

  1. k get node

ImagePolicyWebhook

https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#imagepolicywebhook

安全工具使用

kube-bench

A tool to detect potential security issues and give the specific means to solve the issue.

Reference:

1
2
3
# Simple way in a kubernetes cluster created by kubeadm
$ kubectl apply \
-f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml

Contents
Consists of the following topics:

  • master
  • etcd
  • controlplane
  • node
  • policies

Each topic starts with a list of items which was checked with checked status, then a list of remediations to FAIL or WARN items given. You can fix those issues under the given instructions. At last, check summary of this topic.

Here is a output example for topic master

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[WARN] 1.1.9 Ensure that the Container Network Interface file permissions are set to 600 or more restrictive (Manual)
[WARN] 1.1.10 Ensure that the Container Network Interface file ownership is set to root:root (Manual)

== Remediations master ==
1.1.9 Run the below command (based on the file location on your system) on the control plane node.
For example, chmod 600 <path/to/cni/files>
1.1.10 Run the below command (based on the file location on your system) on the control plane node.
For example,
chown root:root <path/to/cni/files>

== Summary master ==
38 checks PASS
9 checks FAIL
13 checks WARN
0 checks INFO

Full contexts can be touch by this link

trivy

Reference: https://github.com/aquasecurity/trivy

Scan a docker image

1
trivy image --severity LOW,MEDIUM ghcr.io/feiyudev/shs:latest

扫描某命名空间下所有pod所使用的镜像包含 HIGH, CRITICAL 类型漏洞,并删除该pod

1
k get pod -A -ojsonpath="{range .items[*]}{.spec['initContainers','containers'][*].image} {.metadata.name} {'#'} {end}" | sed 's|#|\n|g' | sed 's|^ ||g' | sed 's| $||g' | awk '{cmd="echo "$2"; trivy -q image "$1" --severity HIGH,CRITICAL | grep Total";system(cmd)}'

该命令的注意点:

  • jsonpath range
  • awk system(cmd)
  • sed replace

sysdig

Reference: https://github.com/draios/sysdig

Installation(Based on Ubuntu 22.04)

  • Download deb from sysdig-release
  • sudo dpkg -i sysdig-0.34.1-x86_64.deb
  • sudo apt -f install

Output format

1
2
%evt.num %evt.outputtime %evt.cpu %proc.name (%thread.tid) %evt.dir %evt.type %evt.info
173884 15:06:10.075401786 7 sudo (1453517.1453517) > read fd=9(<f>/dev/ptmx) size=65536

Notes:

  1. evt.dir aka event direction, < represents out, > represents in.
  2. evt.type aka event type, perceiving it as a name of system call.

Chisels

predefined function sets based on sysdig events, to implements complex situation. Locates in /usr/share/sysdig/chisels on Linux machine.

What are those chisels?

  1. To see chisels.
1
2
3
sysdig -cl
# or
sysdig --list-chisels
  1. To use a chisel
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# See HTTP log
sysdig -c httplog
2024-01-25 23:06:16.423272777 < method=GET url=:8080/health response_code=200 latency=0ms size=2B
2024-01-25 23:06:16.423299653 > method=GET url=:8080/health response_code=200 latency=0ms size=2B
# See CPU usage ranking
sysdig -c topprocs_cpu
CPU% Process PID
--------------------------------------------------------------------------------
8.01% kube-apiserver 39124
3.00% kubelet 25007
3.00% etcd 1613
2.00% sysdig 102489
2.00% kube-controller 38957
2.00% calico-node 4705
1.00% containerd 874
1.00% vmtoolsd 790
1.00% kube-scheduler 39017
0.00% svlogd 2505
  1. Advanced usage about a chisel
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
$ sysdig -i spy_file

Category: I/O
-------------
spy_file Echo any read/write made by any process to all files. Optionall
y, you can provide the name of one file to only intercept reads
/writes to that file.

This chisel intercepts all reads and writes to all files. Instead of all files,
you can limit interception to one file.
Args:
[string] read_or_write - Specify 'R' to capture only read event
s; 'W' to capture only write events; 'RW' to capture read and w
rite events. By default both read and write events are captured
.
[string] spy_on_file_name - The name of the file which the chis
el should spy on for all read and write activity.

$ sysdig -c spy_file "RW /root/spy_file_test.txt"
23:53:25.592303985 date(112109) W 32B /root/spy_file_test.txt
Thu Jan 25 11:53:25 PM HKT 2024

23:53:43.333152845 cat(112206) R 32B /root/spy_file_test.txt
Thu Jan 25 11:53:25 PM HKT 2024

23:53:43.333166670 cat(112206) R 0B /root/spy_file_test.txt NULL
23:53:51.856062624 date(112270) W 32B /root/spy_file_test.txt
Thu Jan 25 11:53:51 PM HKT 2024

23:53:56.965894638 cat(112307) R 64B /root/spy_file_test.txt
Thu Jan 25 11:53:25 PM HKT 2024
Thu Jan 25 11:53:51 PM HKT 2024

23:53:56.965902094 cat(112307) R 0B /root/spy_file_test.txt NULL

Usage

  1. Save events to a file
1
sysdig -w test.scap
  1. Read events from a file while analyzing (by chisels)
1
sysdig -r test.scap -c httptop
  1. Specify the format to be used when printing the events
    -p , –print=
    Specify the format to be used when printing the events.
    With -pc or -pcontainer will use a container-friendly format.
    With -pk or -pkubernetes will use a kubernetes-friendly format.
    With -pm or -pmesos will use a mesos-friendly format.
    See the examples section below for more info.
1
sysdig -r test.scap -c httptop -pc
  1. Specify the number of events Sysdig should capture by passing it the -n flag. Once Sysdig captures the specified number of events, it’ll automatically exit:
1
sysdig -n 5000 -w test.scap
  1. Use the -C flag to configure Sysdig so that it breaks the capture into smaller files of a specified size.
    The following example continuously saves events to files < 10MB:
1
sysdig -C 10 -w test.scap
  1. Specify the maximum number of files Sysdig should keep with the -W flag. For example, you can combine the -C and -W flags like so:
1
sysdig -C 10 -W 4 -w test.scap
  1. You can analyze the processes running in the WordPress container with:
1
sysdig -pc -c topprocs_cpu container.name=wordpress-sysdig_wordpress_1
  1. -M Stop collecting after reached.

Help

关于filter可用的字段,可以通过sysdig -l来查看所有支持的字段。例如查看容器相关的过滤字段,有:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ubuntu@primary:~$ sysdig -l | grep "^container."
container.id The truncated container ID (first 12 characters), e.g. 3ad7b26ded6d is extracted from the
container.full_id The full container ID, e.g.
container.name The container name. In instances of userspace container engine lookup delays, this field
container.image The container image name (e.g. falcosecurity/falco:latest for docker). In instances of
container.image.id The container image id (e.g. 6f7e2741b66b). In instances of userspace container engine
container.type The container type, e.g. docker, cri-o, containerd etc.
container.privileged 'true' for containers running as privileged, 'false' otherwise. In instances of userspace
container.mounts A space-separated list of mount information. Each item in the list has the format
container.mount (ARG_REQUIRED) Information about a single mount, specified by number (e.g.
container.mount.source (ARG_REQUIRED) The mount source, specified by number (e.g. container.mount.source[0]) or
container.mount.dest (ARG_REQUIRED) The mount destination, specified by number (e.g. container.mount.dest[0])
container.mount.mode (ARG_REQUIRED) The mount mode, specified by number (e.g. container.mount.mode[0]) or
container.mount.rdwr (ARG_REQUIRED) The mount rdwr value, specified by number (e.g. container.mount.rdwr[0])
container.mount.propagation (ARG_REQUIRED) The mount propagation value, specified by number (e.g.
container.image.repository The container image repository (e.g. falcosecurity/falco). In instances of userspace
container.image.tag The container image tag (e.g. stable, latest). In instances of userspace container engine
container.image.digest The container image registry digest (e.g.
container.healthcheck The container's health check. Will be the null value ("N/A") if no healthcheck
container.liveness_probe The container's liveness probe. Will be the null value ("N/A") if no liveness probe
container.readiness_probe The container's readiness probe. Will be the null value ("N/A") if no readiness probe
container.start_ts Container start as epoch timestamp in nanoseconds based on proc.pidns_init_start_ts and
container.duration Number of nanoseconds since container.start_ts.
container.ip The container's / pod's primary ip address as retrieved from the container engine. Only
container.cni.json The container's / pod's CNI result field from the respective pod status info. It contains

可以看出,container.id只能取前12个字符,另外也可以用容器id的全名,即container.full_id。另外k8s可支持的字段有:

1
2
3
4
5
6
7
8
9
10
11
ubuntu@primary:~$ sysdig -l | grep "^k8s."
k8s.ns.name The Kubernetes namespace name. This field is extracted from the container runtime socket
k8s.pod.name The Kubernetes pod name. This field is extracted from the container runtime socket
k8s.pod.id [LEGACY] The Kubernetes pod UID, e.g. 3e41dc6b-08a8-44db-bc2a-3724b18ab19a. This legacy
k8s.pod.uid The Kubernetes pod UID, e.g. 3e41dc6b-08a8-44db-bc2a-3724b18ab19a. Note that the pod UID
k8s.pod.sandbox_id The truncated Kubernetes pod sandbox ID (first 12 characters), e.g 63060edc2d3a. The
k8s.pod.full_sandbox_id The full Kubernetes pod / sandbox ID, e.g
k8s.pod.label (ARG_REQUIRED) The Kubernetes pod label. The label can be accessed either with the
k8s.pod.labels The Kubernetes pod comma-separated key/value labels. E.g. 'foo1:bar1,foo2:bar2'. This
k8s.pod.ip The Kubernetes pod ip, same as container.ip field as each container in a pod shares the
k8s.pod.cni.json The Kubernetes pod CNI result field from the respective pod status info, same as

Traps

此处有坑

使用container.id过滤时,注意id的长度需要为12,不然数据出不来。通过crictl ps看到的container id13位的,使用sysdig时,需要注意长度。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ubuntu@primary:~$ crictl ps | grep -v ^C | awk '{print $1,$2,$6,$7}'
0fd88042f1ddf 848c5b919e8d3 Running calico-apiserver
ad81cac0dbf9e 848c5b919e8d3 Running calico-apiserver
f6d6b81c75f69 4e87edec0297d Running calico-kube-controllers
87c4fbddeb123 d36ef67f7b24c Running csi-node-driver-registrar
46095b3ea4bf6 91c1c91da7602 Running calico-csi
51e65353815dc cbb01a7bd410d Running coredns
7fc6f4ad4aafa cbb01a7bd410d Running coredns
de42d610f5530 1843802b91be8 Running calico-node
21ae9adf53e47 b33768e0da1f8 Running calico-typha
a2f7701ceae6c 7bc79e0d3be4f Running tigera-operator
d91edc95d2edf 9344fce2372f8 Running kube-proxy
5f7d85179ade0 6fc5e6b7218c7 Running kube-scheduler
d40dd28cc171c 138fb5a3a2e34 Running kube-controller-manager
c71d33c5aea6e 8a9000f98a528 Running kube-apiserver
0cdeff9542f15 a0eed15eed449 Running etcd

falco

Reference: https://falco.org/docs

strace

监控进程系统调用信号量,基础的使用方式

  1. 监听某个已存在的进程 strace -p <pid>
  2. 直接启动一个二进制 strace <binary-name>
  3. 对输出结果进行过滤 strace -e trace=file

考试说明书

Handbook of CKS exam

Requirments of your computer, microphone, camera, speaker, etc.

Don’t use headphone, earbuds.

Exam Details

Online tests, 15-20 performance-based tasks, 2 hours to complete the tasks.

Don’t cheat, audio,camera,screen capture of the test will be reviewed.

利用OpenWrt为虚拟机做流量代理

安装Openwrt虚拟机

下载镜像:Index of /releases/22.03.5/targets/x86/64/ (openwrt.org)

转换镜像:

1
qemu-img convert -f raw -O vdi openwrt-22.03.5-x86-64-generic-ext4-combined-efi.img openwrt-22.03.5-x86-64-generic-ext4-combined-efi.img.vdi

配置虚拟机网卡:

  1. 在VirtualBox中新建HostNetwork(Host-Only网络),网段为192.168.56.0/24

  2. 在openwrt虚拟机的网络选项中设置:

    1)启用网卡1,连接方式选仅主机网络,名称选上步创建的HostNetwork

    2)启用网卡2,连接方式选桥接网卡,名称选本机上能访问外网的网卡

  3. 进入openwrt虚拟机,为lan口设置静态IP地址为HostNetwork中的一个IP,这里用 192.168.56.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
root@OpenWrt:~# cat /etc/config/network

config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'

config globals 'globals'
option ula_prefix 'fdd4:bccc:9ebb::/48'

config device
option name 'br-lan'
option type 'bridge'
list ports 'eth0'

config interface 'lan'
option device 'br-lan'
option proto 'static'
option ipaddr '192.168.56.2'
option netmask '255.255.255.0'
option ip6assign '60'

config interface 'wan'
option device 'eth1'
option proto 'dhcp'

config interface 'wan6'
option device 'eth1'
option proto 'dhcpv6'

OPENWRT开启SFTP,实现文件下载上传

同时也能使用scp命令进行拷贝

1
2
3
4
opkg update
opkg install vsftpd openssh-sftp-server
/etc/init.d/vsftpd enable
/etc/init.d/vsftpd start

OpenClash安装与配置

1
2
3
#iptables
opkg update
opkg install coreutils-nohup bash iptables dnsmasq-full curl ca-certificates ipset ip-full iptables-mod-tproxy iptables-mod-extra libcap libcap-bin ruby ruby-yaml kmod-tun kmod-inet-diag unzip luci-compat luci luci-base

如果需要强制安装,可以先opkg remove,再opkg install

Releases · vernesong/OpenClash (github.com)页面,将openclash安装包下载到openwrt虚拟机中,通过 opkg install 安装。

配置手册:Home · vernesong/OpenClash Wiki (github.com)

其他虚拟机的旁路由配置

此处用的是ubuntu server,可以在安装界面时设置,也可以等安装完成后,手动修改配置。手动修改的配置如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
ethernets:
enp0s3:
addresses:
- 192.168.56.3/24
nameservers:
addresses:
- 114.114.114.114
search: []
routes:
- to: default
via: 192.168.56.2
version: 2

配置修改后,可以看到默认路由已变成 192.168.56.2

1
2
3
4
$ ip route
default via 192.168.56.2 dev enp0s3 proto static
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.56.0/24 dev enp0s3 proto kernel scope link src 192.168.56.3

Reference

Git 操作及原理

Git Diff 的工作原理

Myers差分算法

创建基于某个 commit id 的分支

1
git checkout -b dev c99d6500

查看指定分支的提交

1
2
git log master
git config --global alias.nicelog "log --graph --abbrev-commit --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset'"

如下所示:

git 如何存储

git 如何 merge

git 如何 rebase

是否能 cherry-pick 所有的 commit

换行符问题

问题Linux 环境执行一些脚本出错,查找原因,发现是文件在Windows环境修改并上传,格式被转换为MS-Dos格式(换行符不同),这样的文件在Linux中运行会出错(shell 解释器把换行符作为一个命令的提交)。背景很久以前,老式的电传打字机使用两个字符来另起新行。回车符(CR,carriag
阅读更多

Certified Calico Operator: Level 1 笔记

证书课程地址:Course | CCO-L1 | Tigera
还有一份可能有用的电子书:
Tigera_eBook_Intro_to_Kubernetes_Networking.pdf

Kubernetes Network Model

  1. 每个 Pod 都有一个 IP 地址;
  2. 同一个 Pod 中的容器共享同一 IP 地址,并能通过该地址相互通信;
  3. Pod 与 Pod 之间可以通过 IP 通信(无需地址转换);
  4. 网络隔离可以限制哪里 Pod 可以访问哪些不可以。

安装测试集群

1
2
3
4
curl https://raw.githubusercontent.com/tigera/ccol1/main/control-init.yaml | multipass launch -n control -m 2048M 20.04 --cloud-init -
curl https://raw.githubusercontent.com/tigera/ccol1/main/node1-init.yaml | multipass launch -n node1 20.04 --cloud-init -
curl https://raw.githubusercontent.com/tigera/ccol1/main/node2-init.yaml | multipass launch -n node2 20.04 --cloud-init -
curl https://raw.githubusercontent.com/tigera/ccol1/main/host1-init.yaml | multipass launch -n host1 20.04 --cloud-init -

重启系统后,可能需要启动所有的虚拟机

1
multipass start --all

安装 Calico

4 种安装方式

  1. Pod 的网段
  2. calico 版本与 kubernetes 版本之间的兼容关系(最好就用教程里面的安装命令)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    kubectl create -f https://docs.projectcalico.org/archive/v3.21/manifests/tigera-operator.yaml
    cat <<EOF | kubectl apply -f -
    apiVersion: operator.tigera.io/v1
    kind: Installation
    metadata:
    name: default
    spec:
    calicoNetwork:
    containerIPForwarding: Enabled
    ipPools:
    - cidr: 198.19.16.0/20
    natOutgoing: Enabled
    encapsulation: None
    EOF
    删除 tigera-operator 命名空间
    1
    2
    3
    4
    curl -H "Content-Type: application/json" \
    -XPUT \
    -d '{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"tigera-operator"},"spec":{"finalizers":[]}}' \
    http://localhost:8001/api/v1/namespaces/tigera-operator/finalize
    相关 Pod 的工作内容:
  • tigera-operator/tigera-operator-xxxx-xxx

监听 Installation CR,并按照配置安装 calico CNI。

  • calico-system/calico-node

DaemonSet,网络策略实现;设置Node节点上的路由;为 IPIP、VXLAN、WireGuard 管理虚拟接口。

  • calico-system/calico-typha

StatefulSet,作为 calico-node 用来查询、监听 api-server 时的缓存层,避免直接访问 api-server。它由 tigera-operator 来随着 node 的变化,进行扩缩容。

  • calico-system/calico-controller

calico 的各种 controller 集合,用于自动同步资源状态。

Service 中的几个 port 的区别

Service: This directs the traffic to a pod.
TargetPort: This is the actual port on which your application is running inside the container.
Port: Some times your application inside container serves different services on a different port.

Example: The actual application can run 8080 and health checks for this application can run on 8089 port of the container. So if you hit the service without port it doesn’t know to which port of the container it should redirect the request. Service needs to have a mapping so that it can hit the specific port of the container.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: MyApp
ports:
- name: http
nodePort: 30475
port: 8089
protocol: TCP
targetPort: 8080
- name: metrics
nodePort: 31261
port: 5555
protocol: TCP
targetPort: 5555
- name: health
nodePort: 30013
port: 8443
protocol: TCP
targetPort: 8085

if you hit the my-service:8089 the traffic is routed to 8080 of the container(targetPort). Similarly, if you hit my-service:8443 then it is redirected to 8085 of the container(targetPort). But this myservice:8089 is internal to the kubernetes cluster and can be used when one application wants to communicate with another application. So to hit the service from outside the cluster someone needs to expose the port on the host machine on which kubernetes is running so that the traffic is redirected to a port of the container. This is node port(port exposed on the host machine). From the above example, you can hit the service from outside the cluster(Postman or any rest-client) by host_ip:nodePort

Say your host machine ip is 10.10.20.20 you can hit the http, metrics, health services by 10.10.20.20:30475, 10.10.20.20:31261, 10.10.20.20:30013.

Edits: Edited as per Raedwald comment.

记录一个与容器1号进程有关的问题

浏览器访问服务 502 发现服务挂了 但是 docker 状态看起来还正常着服务跑在 Docker 容器里面
启动方式为:shell (pid -> 1) -> java ( pid != 1)

  1. docker ps 能看见xxx的 STATUS 为 Up 4 days
  2. docker top xxx 显示出来无进程
  3. docker exec -it xxx bash 显示 cannot exec in a stopped state: unknown

docker logs 能看见日志 可以看见最后一行日志为 killed
docker stop 需要等待一段时间才能结束

Docker 版本:Docker Engine - Community 20.10.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# ubuntu @ ucloud-hk in ~ [10:55:49]
$ docker version
Client: Docker Engine - Community
Version:          20.10.1
API version:      1.41
Go version:        go1.13.15
Git commit:        831ebea
Built:            Tue Dec 15 04:34:58 2020
OS/Arch:          linux/amd64
Context:          default
Experimental:      true

Server: Docker Engine - Community
Engine:
Version:          20.10.1
API version:      1.41 (minimum version 1.12)
Go version:      go1.13.15
Git commit:      f001486
Built:            Tue Dec 15 04:32:52 2020
OS/Arch:          linux/amd64
Experimental:    false
containerd:
Version:          1.4.3
GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version:          1.0.0-rc92
GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version:          0.19.0
GitCommit:        de40ad0

https://blog.csdn.net/zhangjikuan/article/details/114702299

手动安装K8s记录

必要的工具安装

  • cfssl

  • cfssljson

  • kubectl

    证书与CA

    配置和生成kubeconfig

  • kubelet 配置文件

worker-n.kubeconfig,分发到所有 worker 节点上。

  • kube-proxy 配置文件

kube-proxy.kubeconfig,分发到所有 worker 节点上。

  • kube-controller-manager 配置文件

kube-controller-manager.kubeconfig,分发到所有 控制节点 上。

  • kube-scheduler 配置文件

kube-scheduler.kubeconfig,分发到所有 控制节点 上。

  • Admin 配置文件

admin.kubeconfig,分发到所有 控制节点 上。

配置加密k8s secrets的秘钥

encryption-config.yaml,分发到所有 控制节点 上。

配置&部署etcd

前置要求:etcd & etcdctl 二进制
ca.pemkubernetes-key.pemkubernetes.pem 拷贝到 etcd 所在节点上,其中 kubernetes-key.pemkubernetes.pem 作为 etcd https 服务的 TLS 证书。

配置&部署控制节点服务

前置要求:kube-apiserver & kube-controller-manager & kube-scheduler & kubectl 二进制

api-server

在其启动参数中,需要

  1. 指定 CA 为 ca.pem

  2. 指定 etcd 的 CA 证书 ca.pem,公私钥为 kubernetes-key.pemkubernetes.pem、及 etcd 的访问地址

  3. 指定 k8s secrets 的秘钥为 encryption-config.yaml

  4. 指定 kubelet 的 CA 证书 ca.pem,公私钥为 kubernetes-key.pemkubernetes.pem

  5. 指定 api-server https 服务所用的 TLS 证书为 kubernetes-key.pemkubernetes.pem

  6. 指定 service account 的证书为 service-account.pem

    controller-manager

    在其启动参数中,需要

  7. 指定 cluster 所使用的 CA 的公私钥为 ca.pemca-key.pem

  8. 指定 kubeconfig 使用 kube-controller-manager.kubeconfig

  9. 指定 root CA 为 ca.pem

  10. 指定 service account 的私钥为 service-account-key.pem

    scheduler

    在启动参数中,指定配置文件 kube-scheduler.yaml,然后在 kube-scheduler.yaml 中指定 kubeconfig 为 kube-scheduler.kubeconfig


启动控制面服务 & 验证

给 kubelet 添加数据处理权限

创建 system:kube-apiserver-to-kubelet ClusterRolesystem:kube-apiserver ClusterRoleBinding 以允许请求 Kubelet API 和执行大部分来管理 Pods 的任务

配置&部署worker节点

  1. 安装 OS 依赖的组件:socat conntrack ipset

  2. 安装 CRI

  3. 安装 worker 节点所需的二进制:kubectlkube-proxykubelet默认提供的 CNI plugins

    配置kubelet

    创建 kubelet-config.yaml 文件,在该文件中指定

  4. CA 证书

  5. TLS 公私钥分别使用 worker-1.pemworker-1-key.pem。好家伙,kubelet 又是一个 https 服务。

并在 kubelet.service 指定

  1. 配置文件使用文件
  2. 指定 kubelet 所使用的 kubeconfig 为 worker-n.kubeconfig

    配置kube-proxy

    创建 kube-proxy-config.yaml 文件,在该文件中指定所用的 kubeconfig 为 kube-proxy.kubeconfig;再在 kubelet.service 指定使用此配置文件。

启动控制面服务

配置 kubectl 所使用的 kubeconfig

使用 admin.pem 和 admin-key.pem 生成 kubeconfig。

验证

1
2
kubectl get componentstatuses
kubectl get nodes

安装其他重要的组件

  1. 安装 CNI

  2. DNS

  3. 验证

    1
    2
    3
    kubectl run busybox --image=busybox:1.28.3 --command -- sleep 3600
    kubectl get pods -l run=busybox
    kubectl exec -ti $POD_NAME -- nslookup kubernetes

    烟雾测试

  4. 数据加密

  5. Deployment

    1. 创建
    2. 端口转发
    3. 容器日志
    4. 在容器中执行命令
  6. Service

    1. 创建
    2. 访问

PV、PVC、StorageClass 概览

为何如此设计

  1. 解耦对存储的使用与维护

  2. 拓展不同的存储需求

    定义

    PV 描述的是持久化存储数据卷。这个 API 对象主要定义的是⼀个持久化存储在宿主机上的⽬录,⽐如⼀个 NFS 的挂载⽬录。通常情况下,PV 对象是由运维⼈员事先创建在 Kubernetes 集群⾥待⽤的。_类似于接口的具体实现,干活的打工人
    PVC 描述的是 Pod 所希望使⽤的持久化存储的属性。PVC 对象通常由开发⼈员创建;或者以 PVC 模板的⽅式成为 StatefulSet 的⼀部分,然后由 StatefulSet 控制器负责创建带编号的 PVC。
    类似于接口,不提供具体实现_。

    PV与PVC匹配绑定

    规则

  3. 两者的 spec 字段匹配

  4. 两者的 storageClassName 字段必须相同

    过程

    通过 operator 机制,为每个未处于 Bound 状态的 PVC,遍历所有可用的 PV,来匹配到合适的 PV。

    结果

    在 PVC 的 spec.volumeName 字段上填写上 PV 的名称

    两阶段处理

  5. Attach 调用存储系统的 API 将存储挂载到 Pod 将要调度到的 Node 上;

由 AttachDetachController 管理。它不断地检查每⼀个 Pod 对应的 PV,和 这个 Pod 所在宿主机之间挂载情况。从⽽决定,是否需要对这个 PV 进⾏ Attach(或者 Dettach)操作。
作为⼀个 Kubernetes 内置的控制器,Volume Controller ⾃然是 kube-controller-manager 的⼀部分。所以,AttachDetachController 也⼀定是运⾏在 Master 节点上的。当然,Attach 操作只需要调⽤公有云或者具体存储项⽬的 API,并不需要在具体的宿主机上执⾏操作。

  1. Mount
    1. 格式化存储设备
    2. 绑定挂载到 Pod 中

由 VolumeManagerReconciler 管理。它必须发⽣在 Pod 对应的宿主机上,所以必须是 kubelet 组件的⼀部分。它运⾏起来之后,是⼀个独⽴于 kubelet 主循环的 Goroutine(不堵塞 kubelet 主控循环)。

PV 管理方式

  • Static Provisioning

创建 PVC 之后,由运维人员手动创建 PV 的方式

  • Dynamic Provisioning

创建 PVC 时指定 StorageClass,由 StorageClass 中指定的 provisioner 来创建对应的 PV。

StorageClass

当 PVC 中指定的 StorageClass 存在时,调用对应的 provisioner 来创建 PV;否则去匹配带有相同 StorageClass 的 PV。
如果 PVC 中未指定 StorageClass,当集群已经开启了名叫 DefaultStorageClass 的 Admission Plugin时,它就会为 PVC 和 PV ⾃动添加⼀个默认的 StorageClass;否则,PVC 的 storageClassName 的值就是“”,这也意味着它只能够跟 storageClassName 也是“”的 PV 进⾏绑定。

本地持久化卷的实现

延迟绑定:当使用本地存储后,需要延迟 PV 与 PVC 之间的绑定时机到 Pod 调度时,避免出现 PV 与 Pod 不在同一节点上面的问题。

1
2
3
4
5
6
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer