Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lerian.studio/llms.txt

Use this file to discover all available pages before exploring further.

Flowker emits traces, metrics, and structured logs using the OpenTelemetry standard. This guide explains what’s available, how to enable it, and how to interpret the data in your observability stack.

Overview


Flowker’s telemetry is built on three signals:
SignalBackendWhat it covers
TracesTempoDistributed spans across workflow executions and steps
MetricsPrometheusHTTP request rates, latency, and system resource usage
LogsLokiStructured JSON logs for every operation
All signals are exported via OTLP (OpenTelemetry Protocol) to a collector of your choice.

Configuration


Telemetry is controlled by environment variables.
# Enable telemetry (required to activate OTLP export)
ENABLE_TELEMETRY=true

# OTLP collector endpoint (required when ENABLE_TELEMETRY=true)
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

# Service identity
OTEL_RESOURCE_SERVICE_NAME=flowker
OTEL_RESOURCE_SERVICE_VERSION=1.0.0
OTEL_RESOURCE_DEPLOYMENT_ENVIRONMENT=production
OTEL_LIBRARY_NAME=flowker

# Log verbosity: debug | info | warn | error
LOG_LEVEL=info
If ENABLE_TELEMETRY=true is set without OTEL_EXPORTER_OTLP_ENDPOINT, Flowker will fail to start.

Distributed tracing


Every HTTP request and internal operation creates an OpenTelemetry span. Spans are propagated through the full execution chain, so a single workflow run produces a connected trace from the HTTP handler down to individual executor steps.

Span naming convention

Spans follow a <layer>.<resource>.<operation> pattern: Execution spans
Span nameDescription
command.execution.executeRoot span for a workflow execution
command.execution.execute_executor_nodeSpan for each executor node processed
command.execution.execute_with_provider_configSpan for a node resolved with a specific provider config
command.execution.recoverSpan for incomplete execution recovery at startup
Workflow command spans
Span nameDescription
command.workflow.createCreate a new workflow
command.workflow.updateUpdate an existing workflow
command.workflow.activateActivate a workflow
command.workflow.deactivateDeactivate a workflow
command.workflow.cloneClone a workflow
command.workflow.deleteDelete a workflow
Executor configuration spans
Span nameDescription
command.executor_config.createCreate executor configuration
command.executor_config.updateUpdate executor configuration
command.executor_config.activateActivate executor configuration
command.executor_config.enableEnable executor configuration
command.executor_config.disableDisable executor configuration
command.executor_config.mark_configuredMark executor as configured
command.executor_config.mark_testedMark executor as tested
command.executor_config.test_connectivityTest executor connectivity
command.executor_config.deleteDelete executor configuration
Provider configuration spans
Span nameDescription
command.provider_config.createCreate provider configuration
command.provider_config.updateUpdate provider configuration
command.provider_config.enableEnable provider configuration
command.provider_config.disableDisable provider configuration
command.provider_config.test_connectivityTest provider connectivity
command.provider_config.deleteDelete provider configuration
Query spans
Span nameDescription
query.execution.getGet execution by ID
query.execution.listList executions
query.execution.get_resultsGet execution results
query.workflow.getGet workflow by ID
query.workflow.get_by_nameGet workflow by name
query.workflow.listList workflows
query.executor_config.getGet executor config by ID
query.executor_config.get_by_nameGet executor config by name
query.executor_config.listList executor configs
query.executor_config.existsCheck executor config existence
query.executor_config.exists_by_nameCheck executor config existence by name
query.provider_config.getGet provider config by ID
query.provider_config.listList provider configs
In Grafana Tempo, search by service name (flowker) and filter by span name to isolate specific operations. Use command.execution.execute as the entry point to see a full workflow trace.

Metrics


Flowker exposes HTTP and system metrics automatically via the OpenTelemetry SDK. No additional configuration is needed beyond enabling telemetry.

HTTP metrics (via otelfiber)

Collected per route by the otelfiber middleware:
MetricTypeDescription
http.server.durationHistogramRequest duration in milliseconds
http.server.request.sizeHistogramRequest payload size in bytes
http.server.response.sizeHistogramResponse payload size in bytes
http.server.active_requestsUpDownCounterNumber of in-flight requests
Each metric carries labels: http.method, http.route, http.status_code.

System metrics

MetricTypeUnitDescription
system.cpu.usageGaugepercentageCPU usage of the process host
system.mem.usageGaugepercentageMemory usage of the process host

Histogram buckets

Latency histograms use the following bucket boundaries (in seconds):
0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10
Flowker does not expose a Prometheus scrape endpoint (/metrics) directly. Metrics are exported via OTLP to your collector, which then forwards to Prometheus. Configure your OTLP collector to include a prometheusremotewrite exporter.

Structured logging


Flowker uses structured JSON logging via Zap. Every log entry is enriched with contextual fields that can be indexed and queried in Loki.

Log fields reference

FieldDescriptionExample
operationSpan/operation namecommand.execution.execute
workflow.idWorkflow identifierwf_abc123
execution.idExecution identifierexec_xyz789
node.idNode identifier within a workflownode-payment
executor.idExecutor identifierexec_cfg_001
error.messageError description when applicabledatabase ping failed: ...

Log levels

LevelWhen used
debugDetailed internal state — for development only
infoNormal operation milestones (execution started, recovered, etc.)
warnRecoverable issues or unexpected but non-fatal conditions
errorOperation failures that require attention
Set the LOG_LEVEL environment variable to control verbosity.

Example log entries

Workflow execution started:
{
  "level": "info",
  "operation": "command.execution.execute",
  "workflow.id": "wf_abc123",
  "message": "Starting workflow execution"
}
Incomplete execution recovery:
{
  "level": "info",
  "operation": "command.execution.recover",
  "count": 3,
  "message": "Recovering incomplete executions"
}
Execution failed:
{
  "level": "error",
  "execution.id": "exec_xyz789",
  "workflow.id": "wf_abc123",
  "execution.status": "failed",
  "error.message": "executor node missing providerConfigId",
  "message": "Workflow execution failed"
}

Health probes


Flowker exposes Kubernetes-compatible liveness and readiness probes for operational monitoring. Liveness signals whether the process is running; readiness signals whether dependencies (notably the database) are reachable. Configure both at the cluster level as part of your deployment manifests so that orchestration can restart unhealthy pods and remove degraded instances from load balancers.

Grafana dashboards


Flowker’s telemetry integrates directly with the Lerian observability stack. Pre-configured dashboards are available through the Lerian-managed Grafana instance. Request throughput
  • Query: sum(rate(http_server_duration_count{service_name="flowker"}[5m])) by (http_route)
  • Shows requests per second, broken down by route
P95 latency
  • Query: histogram_quantile(0.95, sum(rate(http_server_duration_bucket{service_name="flowker"}[5m])) by (le, http_route))
  • Shows the 95th percentile response time per route
Error rate
  • Query: sum(rate(http_server_duration_count{service_name="flowker", http_status_code=~"5.."}[5m])) / sum(rate(http_server_duration_count{service_name="flowker"}[5m]))
  • Shows the ratio of 5xx responses
Active executions (via logs)
  • Loki query: {service_name="flowker"} |= "Starting workflow execution" | count_over_time([1m])
For full observability stack setup, see Platform → Observability.