Building a Custom Docker Image

Build a custom Docker image for your script tasks.

You can bake all dependencies needed for your script tasks directly into the Kestra’s base image. Here is an example installing Python dependencies:

FROM kestra/kestra:latest
USER root
RUN apt-get update -y && apt-get install pip -y
RUN pip install --no-cache-dir pandas requests boto3

Then, point to that Dockerfile in your docker-compose.yml file:

services:
kestra:
build:
context: .
dockerfile: Dockerfile
image: kestra-python:latest

Once you start Kestra containers using docker compose up -d, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS runner:

id: python_process
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
runner: PROCESS
script: |
import pandas as pd
import requests
import boto3
print(f"Pandas version: {pd.__version__}")
print(f"Requests version: {requests.__version__}")
print(f"Boto3 version: {boto3.__version__}")

Building a custom Docker image for your script tasks

Imagine you use the following flow:

id: zip_to_python
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
env:
FILE_ID: "{{ render(vars.file_id) }}"
inputFiles: "{{ outputs.unzip.files }}"
script: |
import os
import pandas as pd
file_id = os.environ["FILE_ID"]
file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file)
df.to_parquet(f"{file_id}.parquet")
outputFiles:
- "*.parquet"

The Python task requires pandas to be installed. Pandas is a large library, and it’s not included in the default python image. In this case, you have the following options:

  1. Install pandas in the beforeCommands property of the Python task.
  2. Use one of our pre-built images that already include pandas, such as the ghcr.io/kestra-io/pydata:latest image.
  3. Build your own custom Docker image that includes pandas.

1) Installing pandas in the beforeCommands property

id: install_pandas_at_runtime
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.core.runner.Process
beforeCommands:
- pip install pyarrow pandas
script: |
import pandas as pd
print(f"Pandas version: {pd.__version__}")

2) Using one of our pre-built images

id: use_prebuilt_image
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
script: |
import pandas as pd
print(f"Pandas version: {pd.__version__}")

3) Building a custom Docker image

If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:

FROM python:3.11-slim
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion

Then, build the image:

docker build -t kestra-custom:latest .

Finally, use that image in your flow:

id: zip_to_python
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub
containerImage: kestra-custom:latest # ⚡️ Use your custom image here
env:
FILE_ID: "{{ render(vars.file_id) }}"
inputFiles: "{{ outputs.unzip.files }}"
script: |
import os
import pandas as pd
file_id = os.environ["FILE_ID"]
file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file)
df.to_parquet(f"{file_id}.parquet")
outputFiles:
- "*.parquet"

Note how the pullPolicy: NEVER property is used to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.