Errors
Kestra provides multiple ways to handle errors, helping you both identify issues and decide whether your flows should stop or continue running after an error.
errors
Component
errors
is a list of tasks set at the flow level that are executed when an error occurs. You can add multiple tasks, and they are executed sequentially. This is useful for sending alerts when errors occur.
The example below sends a flow-level failure alert via Slack using the SlackIncomingWebhook task defined using the errors
property.
id: errorsnamespace: company.team
description: This will always fail
tasks: - id: failed_task type: io.kestra.plugin.core.execution.Fail
errors: - id: alert_on_failure type: io.kestra.plugin.notifications.slack.SlackIncomingWebhook url: secret('SLACK_WEBHOOK') messageText: "Failure alert for flow {{ flow.namespace }}.{{ flow.id }} with ID {{ execution.id }}"
Two kinds of error handlers can be defined:
- Global: error handling global to a flow that must be at the root of the flow
- Local: error handling local to a Flowable Task, handles errors for the flowable task and its children
Global Error Handler
This example shows a global error handler. The first task fails immediately, triggering the handler, which then logs the ID of the failed task using the errorLogs()
function.
id: errorsnamespace: company.team
tasks: - id: failed type: io.kestra.plugin.core.execution.Fail
errors: - id: 2nd type: io.kestra.plugin.core.log.Log message: I'm failing {{ errorLogs()[0]['taskId'] }} # Because errorLogs() is an array, the first taskId to fail is retrieved. level: INFO
Local Error Handler
This example demonstrates a local error handler that applies only to the children of t2
. Errors from other tasks, like t1
, are not handled here.
This can be useful to restrict error handling for a specific part of the flow and perform specific tasks like resource cleanup.
id: errorsnamespace: company.team
tasks: - id: parent-seq type: io.kestra.plugin.core.flow.Sequential tasks: - id: t1 type: io.kestra.plugin.core.debug.Return format: "{{task.id}} > {{taskrun.startDate}}" - id: t2 type: io.kestra.plugin.core.flow.Sequential tasks: - id: t2-t1 type: io.kestra.plugin.core.execution.Fail errors: - id: error-t1 type: io.kestra.plugin.core.debug.Return format: "Error Trigger ! {{task.id}}"
allowFailure
and allowWarning
Property
When you execute a flow and one of its tasks fails, downstream tasks are not executed. This may not always be desirable, especially for non-critical tasks. You can resolve this by adding the allowFailure
property to the task, which allows downstream tasks to continue despite an error. In this case, the execution will finish in a WARNING
state.
id: allow_failurenamespace: company.team
description: This flow will allow a failure of a task (imagine a flaky unit test) and will continue processing downstream tasks, but the execution will finish in a `WARNING` state.
tasks: - id: first type: io.kestra.plugin.core.debug.Return format: "{{ task.id }} > {{ taskrun.startDate }}"
- id: allow_failure type: io.kestra.plugin.core.execution.Fail allowFailure: true
- id: last type: io.kestra.plugin.core.debug.Return format: "{{ task.id }} > {{ taskrun.startDate }}"
There’s also the allowWarning
property which acts similar to the allowFailure
property, but the execution will finish in a SUCCESS
state even if warnings occurred.
id: allow_warningnamespace: company.team
description: This flow will allow a warning of a task (imagine a notification task) and will continue processing downstream tasks, with the execution finishing in a `SUCCESS` state even if warnings occurred.
tasks: - id: first type: io.kestra.plugin.core.debug.Return format: "{{ task.id }} > {{ taskrun.startDate }}"
- id: allow_warning type: io.kestra.plugin.scripts.python.Script allowWarning: true beforeCommands: - pip install kestra script: | from kestra import Kestra
logger = Kestra.logger() logger.warning("WARNING signals something unexpected.")
Best practices for error handling
- Use global handlers for alerts and monitoring across the whole flow.
- Use local handlers for targeted cleanup or retries.
- Add
allowFailure
for non-critical tasks that shouldn’t block execution. - Use
allowWarning
when warnings should not mark the execution as failed.