# 多个类型用逗号隔开
$ ./k8sgpt filters remove Pod,Service
Filter(s) Pod, Service removed
# 会发现 Pod 和 Service 类型已经在 Unused 下
$ ./k8sgpt filters list
Active:
> ValidatingWebhookConfiguration
> ReplicaSet
> PersistentVolumeClaim
> Node
> Ingress
> CronJob
> MutatingWebhookConfiguration
> Deployment
> StatefulSet
Unused:
> Pod
> Service
> HorizontalPodAutoScaler
> PodDisruptionBudget
> NetworkPolicy
3、下面看看分析的结果,发现将 Pod,Service 类型已经过滤掉了
1
2
3
4
5
6
7
8
9
10
11
$ ./k8sgpt analyze
AI Provider: openai
0 kube-system/snapshot-controller(snapshot-controller)
- Error: StatefulSet uses the service kube-system/snapshot-controller which does not exist.
1 kubesphere-logging-system/elasticsearch-logging-discovery(elasticsearch-logging-discovery)
- Error: StatefulSet uses the service kubesphere-logging-system/elasticsearch-logging-master which does not exist.
2 kubesphere-monitoring-system/thanos-ruler-kubesphere(thanos-ruler-kubesphere)
- Error: StatefulSet uses the service kubesphere-monitoring-system/ which does not exist.
$ ./k8sgpt analyze --filter=Service
Service openebs/openebs.io-local does not exist
AI Provider: openai
0 istio-system/jaeger-operator-metrics(jaeger-operator-metrics)
- Error: Service has no endpoints, expected label name=jaeger-operator
2、指定命名空间过滤结果
1
2
3
4
5
$ ./k8sgpt analyze --filter=Service --namespace istio-system
AI Provider: openai
0 istio-system/jaeger-operator-metrics(jaeger-operator-metrics)
- Error: Service has no endpoints, expected label name=jaeger-operator
replicaCount: 1
deployment:
# 镜像最好提前下载下来,这个镜像非常大,有 12G
image: quay.io/go-skynet/local-ai:latest
env:
# cpu 核数,最好 8c
threads: 8
context_size: 512# 大模型在容器中的目录
modelsPath: "/models"
resources:
{}# limits:# cpu: 100m# memory: 128Mi# requests:# cpu: 100m# memory: 128Mi# Prompt templates to include# Note: the keys of this map will be the names of the prompt template files
promptTemplates:
{}# ggml-gpt4all-j.tmpl: |# The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.# ### Prompt:# {{.Input}}# ### Response:# Models to download at runtime
models:
# Whether to force download models even if they already exist
forceDownload: false# The list of URLs to download models from# Note: the name of the file will be the name of the loaded model
list:
# 指定大模型,这里使用 ggml-gpt4all-j# 指定大模型后, 部署完之后, 会启动一个 initContainer 来下载这个大模型, 有3.5G, 所以建议提前下载下来, 放到pv目录下
- url: "https://gpt4all.io/models/ggml-gpt4all-j.bin"# basicAuth: base64EncodedCredentials# 开启 pvc, 持续存储大模型
persistence:
pvc:
enabled: true
size: 6Gi
accessModes:
- ReadWriteOnce
annotations: {}# Optional
storageClass: ~
hostPath:
enabled: false
path: "/models"
service:
type: ClusterIP
port: 80
annotations: {}# If using an AWS load balancer, you'll need to override the default 60s load balancer idle timeout# service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "1200"
ingress:
enabled: false
className: ""
annotations:
{}# kubernetes.io/ingress.class: nginx# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []# - secretName: chart-example-tls# hosts:# - chart-example.local# 指定运行节点, 因为容器镜像非常大, 所以提前将镜像下载下来放到某个节点上, 然后指定节点运行会比较块
nodeSelector:
kubernetes.io/hostname: master-172-31-97-104
# 使用 localai 只分析 K8S 集群中 service 资源的问题
$ ./k8sgpt analyze --explain -b localai --filter Service
# 输出:
AI Provider: localai
0 swimming-demo/consumer(consumer)
- Error: Service has no endpoints, expected label app=consumer
- Error: Service has no endpoints, expected label app.kubernetes.io/name=consumer
- Error: Service has no endpoints, expected label app.kubernetes.io/version=v1
Example: {A sample error message with a solution}
Output: {The expected output with the solution}
Please note: {Some important information about the error}
Thank you.
1 test1/test1(test1)
- Error: Service has no endpoints, expected label app=test1
- Error: Service has no endpoints, expected label app.kubernetes.io/name=test1
- Error: Service has no endpoints, expected label app.kubernetes.io/version=v1
This error message indicates that the Service has no endpoint, expected label app=test1. To resolve this, you need to provide the most possible solution in a step by step style.
Here are the steps you can follow:
1. Make sure you have Kubernetes installed and running.
2. Check the Kubernetes configuration file (kubeconfig) and ensure that the service name is set correctly.
3. Check the labels on your Deployment and ReplicaSet. Ensure that the labels are set correctly and match your Service name.
4. Check that your Deployment and ReplicaSet have a replica set defined.
5. Ensure that your ReplicaSet has a Deployment defined.
6. Ensure that your Deployment has a replicas set defined.
7. Ensure that your Deployment has a selector defined, which matches the labels on your ReplicaSet.
8. Ensure that your Deployment has a replicas set defined.
9. Ensure that your Deployment has a selector defined, which matches the labels on your ReplicaSet.
10. Ensure that your Deployment has a replicas set defined.
11. Ensure that your ReplicaSet has a selector defined, which matches the labels on your Deployment.
12. Ensure that your ReplicaSet has a replicas set defined.
13. Ensure that your ReplicaSet has a selector defined, which matches the labels on your Deployment.
14. Ensure that your ReplicaSet has a replicas set defined.
15. Ensure that your Deployment has a replicas set defined.
16. Ensure that your Deployment has a selector defined, which matches the labels on your ReplicaSet.
17. Ensure that your Deployment has a replicas set defined.
18. Ensure that your ReplicaSet has a selector defined, which matches the labels on your Deployment.
19. Ensure that your ReplicaSet has a replicas set defined.
20. Ensure that your Deployment has a replicas set defined.
21. Ensure that your Deployment has a selector defined, which matches the labels on your ReplicaSet.
22. Ensure that your Deployment has a replicas set defined.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
k8sgpt-operator-controller-manager-948bdbfd9-f7p6k 2/2 Running 0 170m
local-ai-696bf4f754-ptdjg 1/1 Running 0 4h51m
测试
1、创建 K8sgpt CRD 描述后端 gpt 信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion:core.k8sgpt.ai/v1alpha1kind:K8sGPTmetadata:name:k8sgpt-local-ainamespace:defaultspec:ai:enabled:truemodel:ggml-gpt4all-jbackend:localaibaseUrl:http://local-ai.svc.cluster.local:8080/v1noCache:false# k8sgpt 镜像tagversion:v0.3.8filters:- Pod
2、创建完 K8sgpt 资源之后,K8S 集群中会创建一个 K8sgpt 的 deployment 用于此次扫描任务,大概过一会可以通过查看 Result 资源来查看扫描结果。
$ kubectl get result -A
NAMESPACE NAME KIND BACKEND
default apisixapisixetcd2 Pod localai
$ kubectl get result apisixapisixetcd2 -o yaml
apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
creationTimestamp: "2023-08-22T06:01:03Z"
generation: 1
name: apisixapisixetcd2
namespace: default
resourceVersion: "57982314"
uid: b9c4e004-0655-4a9c-80af-fcfb00c976a7
spec:
backend: localai
details: "Example: Error: Could not start container etcd pod=api-sx-etcdd(6e3a8b48-4f11-49c6-9a15-2f4e3c6c9d1b)
with exit code: 1\nThe error message suggests that the container for etcd pod
is failing with exit code 1. The error message also indicates that the pod is
being restarted due to a backoff failure. \nThe solution suggests that the error
could be caused by a few possible reasons, such as incorrect configuration of
etcd or networking issues. \nThe steps to resolve the error are not specified
in this message."
error:
- text: back-off 5m0s restarting failed container=etcd pod=apisix-etcd-2_apisix(d84a9fe9-f02a-41d2-a5a8-fd8dd859603b)
kind: Pod
name: apisix/apisix-etcd-2
parentObject: ""
status: {}