# MP System 部署 节点角色类型: - Kubespray Deployment Node - Kubernetes master node - Kubernetes worker node(5G CP) - Kubernetes worker node(5G UP) ## 单节点部署拓扑 ### 资源参数 **5GC AIO VM**: - Core:8U,16U(推荐) - Memory:16G,32G(推荐) - Disk:200G **GuestOS**:Ubuntu 18.04 **Kernel**:4.15.0-64-generic **Python**:3.6 **User**:root ## Kubespray 定制版 ### 基础设施软件 **Deployment Tools**:Kubespray 定制版 - Ansible - Kubeadm - Helm/Charts **CaaS**:Kubernetes 1.17 - Runtime:Docker CE,Docker Storage Option:Overlay2 - Image Registry:Harbor - CNI: - Flannel: Default CNI - Multus: Support second CNI for 5G NFs - SR-IOV: It is a Kubernetes device plug-in for discovering, advertising, and allocating SR-IOV network virtual function resources. - host-device: Use the physical interface directly. - macvlan: Configure multiple virtual network interfaces on a network interface on the host. - HTTP/HTTPS Access:Nginx Ingress - Storage Option:OpenEBS(https://openebs.io/) **日志审计**: - Elasticsearch - Fluentd - Kibana **监控告警**: - Prometheus - Node Exporter - Alert Manager ### MP System 软件 **MP System 组件**: - Orchestrator Database - Orchestrator - MP **Docker Images**: - 5g-orch-db.tar:Orchestrator DB Image(PostgreSQL)。 - 5g-orch.tar:Orchestrator Image。 - mp-0412.tar:MP Image。 - licenseutil:获取部署节点序列号,用于生产 License。 **Playbook**: - cluster.yml: Install and setup Kubernetes cluster including MP system. - scale.yml: Add more nodes to Kubernetes cluster. - remove-node.yml: Remove nodes from Kubernetes cluster. - reset.yml: Delete the Kubernetes clusters from all nodes. **Inventory**: - 5g_support: MP system installation hosts information and configuration. **Role**: - 5g_support: MP system installation is a standalone role putting into Kubespray. **CLIs**: - Run kubernetes cluster playbook ```bash ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa ``` - To scale out: Add one more node in the inventory/5g_support/hosts.ini ```bash ansible-playbook -i inventory/5g_support/hosts.ini scale.yml -b -v --private-key=~/.ssh/id_rsa ``` - To scale in: Remove one more node in the inventory/5g_support/hosts.ini ```bash ansible-playbook -i inventory/5g_support/hosts.ini remove-node.yml -b -v --extra-vars "node=node4” ``` - To reset cluster (DELETE EVERYTHING): ```bash # If you need to reset the whole kubernetes clusters, cd into the kubespray directory previously cloned and reset it. ansible-playbook -i inventory/5g_support/hosts.ini reset.yml -b -v --private-key=~/.ssh/id_rsa ``` ## 部署执行 ### Pre-Deployment #### Kubespray Deployment Node - 科学上网。 - Generate and Copy the Key Pair to 5G Nodes ```bash $ ssh-keygen $ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gcp-node_ip_address> $ ssh-copy-id -i ~/.ssh/id_rsa.pub root@<5gup-node_ip_address> ``` - 使用国内 APT 源: ```bash $ cp /etc/apt/sources.list /etc/apt/sources.list.backup $ cat > /etc/apt/sources.list < ~/.pip/pip.conf << EOF [global] trusted-host=mirrors.aliyun.com index-url=https://mirrors.aliyun.com/pypi/simple/ EOF ``` - 使用国内 Docker 镜像源: ```bash # 安装 Docker CE apt-get remove docker docker-engine docker-ce docker.io apt-get install apt-transport-https ca-certificates curl software-properties-common -y curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable" apt-get update -y apt-get install docker-ce -y systemctl enable docker && systemctl start docker && systemctl status docker $ vi /etc/docker/daemon.json { "registry-mirrors": [ "https://hub-mirror.c.163.com", "https://mirror.baidubce.com" ] } systemctl daemon-reload && systemctl restart docker ``` - 安装 Kubespray ```bash # 获取 Kubespray 定制版 $ cd kubespray # 单节点 $ git checkout 5g_support_singlenode/2.13 # 多节点 $ git checkout 5g_support/2.13 # 安装 $ pip3 install -r requirements.txt ``` - 上传 MP Images ```bash $ docker load --input 5g-orch-db.tar $ docker load --input 5g-orch.tar $ docker load --input mp-0412.tar ``` #### 5G CP Node **NOTE**:不要开启 HugePage,否则 Harbor Pods 会启动失败。 #### 5G UP Node NOTE:Kernel Version == 4.15.0。 - Add vfio Module ```bash # Add “vfio-pci” in /etc/modules. $ cat /etc/modules vfio-pci # Add “iommu=pt intel_iommu=on” in /etc/default/grub $ cat /etc/default/grub GRUB_DEFAULT=0 GRUB_TIMEOUT_STYLE=hidden GRUB_TIMEOUT=0 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt intel_iommu=on" GRUB_CMDLINE_LINUX="" cgroup_enable=memory swapaccount=1 intel_iommu=on iommu=pt # Update grub and Reboot Server. $ sudo update-grub $ reboot $ lsmod | grep vfio vfio_pci 45056 0 vfio_virqfd 16384 1 vfio_pci vfio_iommu_type1 24576 0 vfio 28672 3 vfio_iommu_type1,vfio_pci irqbypass 16384 1 vfio_pcis ``` ### Install Kubernetes Clusters and MP System - Edit downalod.yml:使用 aliyun 镜像源。 ```bash $ vi kubespray/inventory/5g_support/group_vars/k8s-cluster/downalod.yml ... # If there is internet connection in mainland china, set aliyun_enable to true aliyun_enable: true ``` - (for all-in-one installation)Edit Kubespray Inventory Configuration: ```bash $ cat kubespray/inventory/5g_support/hosts.ini # ## Configure 'ip' variable to bind kubernetes services on a # ## different ip than the default iface # ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value. [all] kube-cluster-1 ansible_host=172.18.22.220 etcd_member_name=etcd1 [kube-master] kube-cluster-1 [kube-node] kube-cluster-1 [k8s-cluster:children] kube-master kube-node [etcd] kube-cluster-1 ``` - (多节点部署)Edit Kubespray Inventory Configuration: ```bash # ## Configure 'ip' variable to bind kubernetes services on a # ## different ip than the default iface # ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value. [all] node1 ansible_host=172.18.17.60 etcd_member_name=etcd1 node2 ansible_host=172.18.17.61 etcd_member_name=etcd2 node3 ansible_host=172.18.17.62 etcd_member_name=etcd3 [kube-master] node1 node2 [etcd] node1 node2 node3 [kube-node] node1 node2 node3 [k8s-cluster:children] kube-master kube-node ``` - (For all-in-one installation)Edit `kubespray/roles/5g_support/tasks/main.yml`:Change all `storageClass=openebs-sc` into `openebs-hostpath`. EXAMPLE: ```bash - name: SMARTCITY | Install Harbor Package command: "{{ bin_dir }}/helm install sco-harbor {{ helm_home_dir }}/packages/ha rbor-1.1.1.tgz --set persistence.persistentVolumeClaim.registry.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.jobservice.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.database.storageClass=openebs-hostpath --set persistence.persistentVolumeClaim.redis.storageClass=openebs-hostpath" ``` - Install Kubespray Cluster and MP system ```bash $ cd kubespray $ sudo ansible-playbook -i inventory/5g_support/hosts.ini cluster.yml -b -v --private-key=~/.ssh/id_rsa ``` - If you have an insecure private registry for docker image, change docker registry option and change corresponding address in the following command: ```bash ansible-playbook -i inventory/5g_support/hosts.ini --become --become-user=root cluster.yml -e '{"docker_insecure_registries":["172.18.22.220"]}' -vvv ``` Wait 10-20 minutes for ansible to finish installation. If some errors have occurred during the installation, you could reset the cluster, solve the issue and re-install again. - Pod 清单 ```bash $ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default elasticsearch-master-0 0/1 Running 2 4d14h default es-curator-elasticsearch-curator-1623632400-hxljs 0/1 Completed 0 2d5h default es-curator-elasticsearch-curator-1623718800-wmf2w 0/1 Completed 0 29h default es-curator-elasticsearch-curator-1623805200-94647 0/1 Completed 0 3h37m default fluentd-46vts 1/1 Running 2 27h default harbor-harbor-core-69fd799fd4-5b7z8 1/1 Running 7 23h default harbor-harbor-database-0 1/1 Running 2 23h default harbor-harbor-jobservice-788df69665-cjdp6 1/1 Running 6 23h default harbor-harbor-portal-78b68cfdb-sjb2b 1/1 Running 2 4d13h default harbor-harbor-redis-0 1/1 Running 2 4d13h default harbor-harbor-registry-5554964584-q5kw4 2/2 Running 4 4d13h default ingress-nginx-ingress-controller-6c748ccfb9-xdp2c 1/1 Running 2 4d14h default ingress-nginx-ingress-default-backend-76fb4dfd79-mtgd2 1/1 Running 2 28h default mp-9fcb8c7c4-7klsz 1/1 Running 2 22h default orch-7bcb9b8498-84whh 1/1 Running 5 22h default orch-db-7954664f45-j86cx 1/1 Running 2 3d1h default prom-adapter-prometheus-adapter-854457d445-9m8g7 1/1 Running 2 4d14h default prom-prometheus-kube-state-metrics-586fdb6d-dhpkl 1/1 Running 2 4d14h default prom-prometheus-node-exporter-lcsnp 1/1 Running 2 4d14h default prom-prometheus-server-6c88db7cd7-6r7br 2/2 Running 4 4d14h kube-system coredns-76798d84dd-xt4ll 1/1 Running 2 20h kube-system coredns-76798d84dd-zfwrj 0/1 Pending 0 20h kube-system dns-autoscaler-56549847b5-s8mmk 1/1 Running 2 4d15h kube-system kube-apiserver-kube-cluster-1 1/1 Running 2 4d15h kube-system kube-controller-manager-kube-cluster-1 1/1 Running 5 4d15h kube-system kube-flannel-kxktp 1/1 Running 4 4d15h kube-system kube-multus-ds-amd64-rfdm8 1/1 Running 2 4d15h kube-system kube-proxy-f6ncr 1/1 Running 2 20h kube-system kube-scheduler-kube-cluster-1 1/1 Running 6 4d15h kube-system kube-sriov-device-plugin-amd64-9qmpn 1/1 Running 2 4d15h kube-system kubernetes-dashboard-77475cf576-mhwv5 1/1 Running 2 4d15h kube-system kubernetes-metrics-scraper-747b4fd5cd-4jtb5 1/1 Running 2 4d15h kube-system nodelocaldns-rqck7 1/1 Running 2 4d15h local-path-storage local-path-provisioner-7dfbb94d64-xdwbl 1/1 Running 2 4d15h openebs cstor-disk-pool-0sls-68759bd4b4-dq5fq 3/3 Running 6 4d14h openebs openebs-admission-server-db47b787f-kt75s 1/1 Running 2 4d15h openebs openebs-apiserver-556ffff45c-6zk4s 1/1 Running 7 4d15h openebs openebs-localpv-provisioner-c6bc845bb-c9svf 1/1 Running 3 4d15h openebs openebs-ndm-lgckq 1/1 Running 4 4d15h openebs openebs-ndm-operator-5f6c5497d7-zm85z 1/1 Running 3 4d15h openebs openebs-provisioner-598c45dd4-2kgmv 1/1 Running 2 4d15h openebs openebs-snapshot-operator-5f74599c8c-z4ffc 2/2 Running 4 4d15h ``` ### Post-Deployment #### Create mp License 跑完 Playbook 之后会看见 orch、orch-db、mp Pods 仍然没有 running,因为需要注入 license.txt。 NOTE: The mp and orchestrator should run on the same node. 现象: ```bash $ kubectl describe pod mp-9fcb8c7c4-mhl2q ... MountVolume.SetUp failed for volume "sco-mp-license" : configmap "sco-mp-license" not found ``` 解决:生产并导入许可证,然后重启 orch、orch-db、mp Pods。 1. Send the string to Astri and Astri generate license. ```bash $ sudo ./licenseutil SGFyZHdhcmUgU2VyaWFsIG51bWJlcjpOb3QgU3BlY2lmaWVkLEN1cnJlbnRUaW1lOjIwMjEtMDYtMTUgMTQ6NTE6MzU= ``` 2. Save the license into license.txt and create configmap on master node. ```bash $ cat license.txt Company Name: 99cloud Expiry Date: 2021-08-15 12:59:59.999 Licensed Servers: All License Key: d5fM8RG/LzwtbKpg94i/MAFs5we36IDq+GZ3rfIUdCx3CNUlEpy9QJ5Ov+NwcNJ89dQJAhzfSAKI0PfnVTbrF68g1MJO+e8sTriROt+3n1v9QdpSiRis0PtSr7UR9lf6ao+xr5LbttlKjXs0fSFq5NBw3WsywiBNtNvG+9DiRgs= $ kubectl create configmap sco-mp-license --from-file=license.txt ``` #### Expose Service to Outside Network To expose orchestrator service, please run following command line on 5gcp-node. Please change `` to 5gcp-node IP address. ```bash # ClusterIP $ kubectl patch svc orch -p '{"spec": {"externalIPs": [""]}}' # NodePort $ kubectl patch svc orch -p '{"spec": {"externalIPs": ["172.18.22.220"], "externalTrafficPolicy": "Cluster", "type": "NodePort", "ports": [{"name": "websocket", "port": 80, "protocol": "TCP", "targetPort": 80, "nodePort": 32614}]}}' ``` ### Login Web UI - Web UI:http://172.18.22.220:32614/sco/web/login - root/root ## 附录 ### Kubectl Configuration On the target-node: ```bash sudo mkdir -p $HOME/.kube sudo cp -f /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config ``` ### Setup SRIOV for UPF Node - Setup VFs ```bash # find which pci device support SRIOV lspci -nnn | grep Ether # Max Allowed VFs cat /sys/class/net//device/sriov_totalvfs # Set VF Number echo 8 > /sys/class/net//device/sriov_numvfs # Check VF ip link show # Find pci-addr of Interface ethtool -i ``` - Edit Config Map ```bash $ kubectl -n kube-system edit cm sriovdp-config # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: v1 data: config.json: | { "resourceList": [ { "resourceName": "sriovnetwork", "selectors": { "vendors": ["8086"], "devices": ["1521"], "drivers": ["ixgbevf"] } } ] } ``` - Restart SRIOV Pods ```bash kubectl -n kube-system get pod # Check in Pod printenv PCIDEVICE_INTEL_COM_SRIOVNETWORK=0000:01:10.0 ``` ### Setup Hugepage for UPF Node ```bash $ cat /etc/sysctl.conf ... vm.nr_hugepages = 512 $ sysctl -p $ sysctl -w vm.nr_hugepages=512 $ cat /proc/meminfo | grep Huge AnonHugePages: 0 kB ShmemHugePages: 0 kB HugePages_Total: 512 HugePages_Free: 512 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 1048576 kB ``` ### Harbor 支持大页内存 ```bash $ kubectl get pv | grep harbor-database pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c 1Gi RWO Delete Bound default/database-data-harbor-harbor-database-0 openebs-hostpath 3d16h $ kubectl get pv pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c -o yaml | grep path openebs.io/cas-type: local-hostpath path: /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c storageClassName: openebs-hostpath $ cd /var/openebs/local/pvc-6d985fe4-15d7-4c85-8056-aa364c2aee5c $ vi postgresql.conf ... huge_pages = off # on, off, or try $ 重启 Harbor 的 3 个 Pod ``` ### Build CP Docker Images - AMF ```bash $ cat amf/Dockerfile FROM ubuntu:18.04 LABEL version=1.0.2 RUN apt-get update --fix-missing && \ apt-get install -y iproute2 apt-utils net-tools iputils-ping tcpdump iptables gdb vim dmidecode dnsutils curl && \ apt-get clean COPY lib /usr/local/lib/amf/lib COPY log /var/log/amf COPY amf /usr/local/bin/amf COPY etc/amf /etc/amf WORKDIR /etc/amf/ $ sudo docker build -t 192.168.205.17/5g/amf:1.0.2 amf/ ``` ### Build UPF Docker Image - UPF ```bash # 1. Into vpp source code, create container. $ git clone git@gitlab.sh.99cloud.net:5GS/upf-astri.git $ cd upf-astri $ git checkout XXX $ git tag -a v1.0 -m "test" # if no tags $ ./buildpack container a215761ad1f101536885d5554640f7428f2a8487b41222398e7d5010eb04cc07 root@a215761ad1f1:/opt# $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a478be4fec3a fastgate/ubuntu1804:upf "/bin/bash" 29 minutes ago Up 29 minutes root_upf # 2. Compile deb package in container. $ mkdir -p build/external/downloads/ $ cp /downloads/*tar* build/external/downloads/ $ make pkg-deb $ mkdir dep-deb $ cp /dep-deb/*.deb dep-deb/ $ mv build-root/*.deb . $ mv build/external/*.deb . $ tar -czf upf_deb.tgz Makefile *.deb dep-deb/ # 3. Copy deb package of UPF (eg. upf_deb.tgz) into vpp repo and unzip the file. $ cd vpp $ docker cp root_upf:/opt/upf_deb.tgz . $ tar -zxvf upf_deb.tgz # 4. Build VPP image via Dockerfile into vpp/ folder. $ cd vpp $ sudo docker build -t 192.168.205.17/5g/upf:1.0.0 -f Dockerfile . $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE 192.168.205.17/5g/upf 1.0.0 f29a82c4ed86 11 seconds ago 936MB ``` - VPP Agent ```bash # 1. Into vpp-agent repo and run docker/astri/dev/start.sh to enter upf-agent-dev container. git clone git@gitlab.sh.99cloud.net:5GS/vpp-agent.git cd vpp-agent git checkout main ./docker/astri/dev/start.sh # 2. To compile deb package in container. make cmd make deb # 3. Exit the container, copy the deb package into docker/astri/prod/ and build upf-agent image. sudo cp docker/astri/*.deb docker/astri/prod/ cd docker/astri/prod sudo docker build -t 192.168.205.17/5g/upf-agent22:1.0.0 -f Dockerfile . ``` ## Troubleshooting ### Useful CLI in Troubleshooting - For each individual pod not up and running, try: ```bash kubectl describe pod ``` - For pod is up and running, however the there are some error internally, try to read the logs of pod: ```bash kubectl logs -f ``` - To restart the pod, simply run: ```bash kubectl delete pod ``` - To open an port on host for troubleshooting on service: For example, if you want to access some service which is only internally accessible in the kubernetes cluster, try: ```bash kubectl edit svc ``` Then change the type of service from ClusterIP to NodePort, which kube-proxy will then create a host port direct to the service port on each host in the kubernetes. You could use kubectl get svc to view the host port opened for that service. ### 问题 1:Ansible 执行失败 ```bash Could not detect which package manager to use. Try gathering facts or settin ``` 解决:Ansible 统一使用 Python3。 ```bash $ python --version Python 3.6.9 ``` ### 问题 2:Harbor Pod 启动失败 - 现象 ```bash # kubectl get pods --all-namespaces | grep har default harbor-harbor-core-69fd799fd4-rlrd5 0/1 CrashLoopBackOff 886 2d4h default harbor-harbor-database-0 0/1 CrashLoopBackOff 54 4h17m default harbor-harbor-jobservice-788df69665-4msxt 0/1 CrashLoopBackOff 835 2d4h default harbor-harbor-portal-78b68cfdb-sjb2b 1/1 Running 0 3d13h default harbor-harbor-redis-0 1/1 Running 0 3d13h default harbor-harbor-registry-5554964584-q5kw4 2/2 Running 0 3d13h ``` - 原因:Harbor Database 使用了 PostgreSQL,默认是不支持 HugePage 的。 - 解决:关闭 HugePage,然后重启 Pods。