IonToParquet ​Ion​To​Parquet

yaml
type: "io.kestra.plugin.serdes.parquet.iontoparquet"
yaml
id: ion_to_parquet
namespace: company.team

tasks:
  - id: download_csv
    type: io.kestra.plugin.core.http.Download
    description: salaries of data professionals from 2020 to 2023 (source ai-jobs.net)
    uri: https://huggingface.co/datasets/kestra/datasets/raw/main/csv/salaries.csv

  - id: avg_salary_by_job_title
    type: io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      data.csv: "{{ outputs.download_csv.uri }}"
    sql: |
      SELECT
        job_title,
        ROUND(AVG(salary),2) AS avg_salary
      FROM read_csv_auto('{{ workingDir }}/data.csv', header=True)
      GROUP BY job_title
      HAVING COUNT(job_title) > 10
      ORDER BY avg_salary DESC;
    store: true

  - id: result
    type: io.kestra.plugin.serdes.parquet.IonToParquet
    from: "{{ outputs.avg_salary_by_job_title.uri }}"
    schema: |
      {
        "type": "record",
        "name": "Salary",
        "namespace": "com.example.salary",
        "fields": [
          {"name": "job_title", "type": "string"},
          {"name": "avg_salary", "type": "double"}
        ]
      }
Properties
Default GZIP
Possible Values
UNCOMPRESSEDSNAPPYGZIPZSTD
Default yyyy-MM-dd[XXX]
Default yyyy-MM-dd'T'HH:mm[:ss][.SSSSSS][XXX]
Default .
Default 1048576
SubType string
Default ["f","false","disabled","0","off","no",""]
Default false
SubType string
Default ["","#N/A","#N/A N/A","#NA","-1.#IND","-1.#QNAN","-NaN","1.#IND","1.#QNAN","NA","n/a","nan","null"]
Default 100
Default 1048576
Default V2
Possible Values
V1V2
Default 134217728
Default false
Default HH:mm[:ss][.SSSSSS][XXX]
Default Etc/UTC
SubType string
Default ["t","true","enabled","1","on","yes"]
Format uri