This guide helps you diagnose and resolve common issues when deploying or operating Midaz on Kubernetes with Helm. Each section covers a specific symptom, the diagnostic commands to investigate it, and the steps to resolve it.Documentation Index
Fetch the complete documentation index at: https://docs.lerian.studio/llms.txt
Use this file to discover all available pages before exploring further.
General diagnostic commands
Start with these commands to get a broad picture of your deployment state before diving into specific issues.
Pods stuck in Pending
Symptom: One or more pods remain in
Pending state and never start.
Diagnostic commands:
-
Insufficient CPU or memory on nodes — The scheduler cannot find a node that satisfies the pod’s resource requests.
Check the
Eventssection ofkubectl describe pod. Look for messages likeInsufficient cpuorInsufficient memory. Either reduceresources.requestsin yourvalues.yaml, or add more nodes to the cluster. -
PersistentVolumeClaim not bound — A PVC required by a dependency (PostgreSQL, MongoDB, Valkey) is stuck in
Pending.Verify that a StorageClass is available and set as the default. See PVC stuck in Pending below. -
Node selector or affinity mismatch — The pod requires a specific node label that no node in the cluster has.
Check your
values.yamlfornodeSelectororaffinitysettings, and verify that your nodes have the expected labels:
ImagePullBackOff
Symptom: Pods show
ImagePullBackOff or ErrImagePull status.
Diagnostic commands:
-
Wrong image tag — The specified tag does not exist in the registry. Check the
image.tagvalue in yourvalues.yamlagainst the version compatibility table. -
Private registry requires authentication — The cluster cannot pull images without credentials.
Create an image pull secret and reference it in your
values.yaml: -
Missing
imagePullSecrets— The secret exists but is not referenced in the component’s config. EnsureimagePullSecretsis set for all affected components.
CrashLoopBackOff
Symptom: Pods start and immediately crash, restarting repeatedly. Diagnostic commands:
-
Bad or missing environment variables — A required config key is absent or has an incorrect value. Check the logs for messages like
missing env var,invalid config, or similar. Review theconfigmapsection of yourvalues.yaml. -
Missing Kubernetes Secret — The pod references a secret that does not exist.
If the secret is missing, create it manually or re-run the Helm install.
-
Wrong database credentials — The service cannot authenticate with PostgreSQL, MongoDB, or Redis.
Check logs for
authentication failed,connection refused, orECONNREFUSED. Verify thesecretssection in yourvalues.yamland confirm the credentials match those used when the databases were provisioned. -
OOMKilled — The container exceeded its memory limit and was killed by the kernel.
Look for
OOMKilledin theLast Statesection. Increaseresources.limits.memoryin yourvalues.yaml. See Pod eviction / OOMKilled below.
Helm install timeout
Symptom:
helm install or helm upgrade fails with a timeout error before the release reaches deployed state.
Diagnostic commands:
-
Slow image pulls — Large images on a slow connection can exceed the default timeout. Increase the timeout:
-
Init containers failing — An init container (e.g., the database bootstrap job) is hanging or retrying. Check init container logs:
-
Readiness probes failing — The pod is running but not passing its readiness check, so Helm waits indefinitely. Describe the pod and look at the
ConditionsandEventssections. You may need to increaseinitialDelaySecondsin your readiness probe settings, or investigate why the service is not healthy on startup.
Services not reachable
Symptom: Midaz APIs are unreachable from outside the cluster, or services cannot communicate internally. Diagnostic commands:
-
Ingress misconfiguration — The Ingress resource exists but the controller is not picking it up. Verify that
ingress.classNamematches the class of your installed ingress controller:Also check that the ingress controller pod itself is running: -
DNS not pointing to the load balancer — The hostname in your Ingress does not resolve to the controller’s external IP. Get the external IP and compare with your DNS record:
-
TLS misconfiguration — A missing or expired TLS secret causes the ingress to fail silently. Verify the secret exists and is not expired:
If using cert-manager, check the Certificate resource status:
PVC stuck in Pending
Symptom: A PersistentVolumeClaim remains in
Pending state and the dependent pod cannot start.
Diagnostic commands:
-
No default StorageClass — No StorageClass is marked as default in the cluster.
If none shows
(default), either create a StorageClass or explicitly set one in yourvalues.yamlfor the affected dependency (e.g.,postgresql.primary.persistence.storageClass). -
Wrong access mode — The StorageClass does not support the access mode requested by the PVC (e.g.,
ReadWriteManyon a storage driver that only supportsReadWriteOnce). Check theEventssection ofkubectl describe pvc. AdjustaccessModesin yourvalues.yamlto match what your StorageClass supports. -
Volume binding mode is
WaitForFirstConsumer— Some StorageClasses use delayed binding. The PVC will stayPendinguntil a pod consuming it is scheduled. This is normal behavior; wait for the pod to be scheduled.
Pod eviction / OOMKilled
Symptom: Pods are repeatedly evicted or show
OOMKilled in their last state.
Diagnostic commands:
-
Memory limits set too low — The container’s
resources.limits.memoryis below what the service actually needs under load. Review the current memory usage withkubectl top pods, then increase the limit in yourvalues.yaml: -
Node under memory pressure — The node itself is under pressure and the kubelet is evicting lower-priority pods. Check node conditions:
Consider adding nodes or enabling cluster autoscaler. You can also set
PriorityClasson Midaz pods to protect them from eviction.
RabbitMQ definitions not loaded
Symptom: Midaz services start but transactions fail, queues are missing, or messages are not being processed. Logs may show AMQP connection errors or missing exchanges/queues. Diagnostic commands:
-
External RabbitMQ missing
load_definitions.json— When using an external RabbitMQ instance, the required queues, exchanges, and bindings are not present. Enable the bootstrap job in yourvalues.yaml:Or apply the definitions manually:Theload_definitions.jsonfile is atcharts/midaz/files/rabbitmq/load_definitions.jsonin the Helm repository. -
Bootstrap job failed silently — The job ran but encountered an error (wrong credentials, network timeout, wrong port).
Verify the
rabbitmqAdminLogincredentials and that the management port (default15672) is reachable from within the cluster.
Related resources
- Deploy Midaz using Helm — Initial installation guide
- Upgrading Midaz and plugins via Helm — Upgrade procedures and rollback
- Upgrading Helm — Breaking changes and migration paths between major versions
- Version compatibility — Version mapping reference
- Helm repository — Source code and release notes

