Ubuntu TechHive
rust-data-pipelines-from-files-to-clean-databases-and-web-dashboards.md
Rust Data Pipelines: From Files to Clean Databases and Web Dashboards
article.detail

Rust Data Pipelines: From Files to Clean Databases and Web Dashboards

reading.progress 15 min read

We are building a small environmental data pipeline. Raw water-quality monitoring files arrive as CSV.

Rust Data Pipelines: From Files to Clean Databases and Web Dashboards

Introduction

We are building a small environmental data pipeline. Raw water-quality monitoring files arrive as CSV. Our Rust tool validates them, cleans bad records, fills safe gaps, stores trusted measurements, and powers a dashboard.

Data Pipeline

About the dataset used

The dataset1 contains raw water-quality monitoring data from Cork Harbour, Moy Killala, and 15 other coastal locations in Ireland. The raw extracted dataset has over 1.27 million entries, and the repository also includes a transformed/pivoted version with 29,159 rows across 11 water-quality parameters. The files are CSV, so they are easy to use for the โ€œfiles โ†’ clean database โ†’ dashboardโ€ flow.

Tools and libraries

We use Rust2 to implement our Data Pipeline by leveraging Polars3.

DataFrame

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "sql"] }
  //! ```

  use chrono::NaiveDate;
  use polars::{
      df,
      error::PolarsError,
      frame::DataFrame,
      prelude::{IntoLazy, col},
  };


  fn main() -> Result<(), PolarsError> {
      let mut df: DataFrame = df!(
            "name" => ["Alice Archer", "Ben Brown", "Chloe Cooper", "Daniel Donovan"],
            "birthdate" => [
                NaiveDate::from_ymd_opt(1997, 1, 10).unwrap(),
                NaiveDate::from_ymd_opt(1985, 2, 15).unwrap(),
                NaiveDate::from_ymd_opt(1997, 3, 22).unwrap(),
                NaiveDate::from_ymd_opt(1997, 4, 30).unwrap(),
            ],
            "weight" => [57.9, 72.5, 54.6, 83.1], // (kg)
            "height" => [1.56, 1.77, 1.65, 1.75], // (m)
        )
        .unwrap();
        println!("Data:");
        print!("{df}\n");

        let head = df.head(Some(2));
        println!("Head:");
        print!("{head}\n");

      Ok(())
  }
Data:
shape: (4, 4)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ name           โ”† birthdate  โ”† weight โ”† height โ”‚
โ”‚ ---            โ”† ---        โ”† ---    โ”† ---    โ”‚
โ”‚ str            โ”† date       โ”† f64    โ”† f64    โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Alice Archer   โ”† 1997-01-10 โ”† 57.9   โ”† 1.56   โ”‚
โ”‚ Ben Brown      โ”† 1985-02-15 โ”† 72.5   โ”† 1.77   โ”‚
โ”‚ Chloe Cooper   โ”† 1997-03-22 โ”† 54.6   โ”† 1.65   โ”‚
โ”‚ Daniel Donovan โ”† 1997-04-30 โ”† 83.1   โ”† 1.75   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Head:
shape: (2, 4)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ name         โ”† birthdate  โ”† weight โ”† height โ”‚
โ”‚ ---          โ”† ---        โ”† ---    โ”† ---    โ”‚
โ”‚ str          โ”† date       โ”† f64    โ”† f64    โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Alice Archer โ”† 1997-01-10 โ”† 57.9   โ”† 1.56   โ”‚
โ”‚ Ben Brown    โ”† 1985-02-15 โ”† 72.5   โ”† 1.77   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Selecting columns

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "sql"] }
  //! ```

  use chrono::NaiveDate;
  use polars::{
      df,
      error::PolarsError,
      frame::DataFrame,
      prelude::{IntoLazy, col},
  };


  fn main() -> Result<(), PolarsError> {
      let mut df: DataFrame = df!(
            "name" => ["Alice Archer", "Ben Brown", "Chloe Cooper", "Daniel Donovan"],
            "birthdate" => [
                NaiveDate::from_ymd_opt(1997, 1, 10).unwrap(),
                NaiveDate::from_ymd_opt(1985, 2, 15).unwrap(),
                NaiveDate::from_ymd_opt(1997, 3, 22).unwrap(),
                NaiveDate::from_ymd_opt(1997, 4, 30).unwrap(),
            ],
            "weight" => [57.9, 72.5, 54.6, 83.1], // (kg)
            "height" => [1.56, 1.77, 1.65, 1.75], // (m)
        )
        .unwrap();

        let result = df
            .clone()
            .lazy()
            .select([
                col("name"),
                col("birthdate").dt().year().alias("birth_year"),
                (col("weight") / col("height").pow(2)).alias("bmi"),
            ])
            .collect()?;
        println!("Column selection:");
        print!("{result}\n");

      Ok(())
  }
Column selection:
shape: (4, 3)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ name           โ”† birth_year โ”† bmi       โ”‚
โ”‚ ---            โ”† ---        โ”† ---       โ”‚
โ”‚ str            โ”† i32        โ”† f64       โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Alice Archer   โ”† 1997       โ”† 23.791913 โ”‚
โ”‚ Ben Brown      โ”† 1985       โ”† 23.141498 โ”‚
โ”‚ Chloe Cooper   โ”† 1997       โ”† 20.055096 โ”‚
โ”‚ Daniel Donovan โ”† 1997       โ”† 27.134694 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Adding columns

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "sql"] }
  //! ```

  use chrono::NaiveDate;
  use polars::{
      df,
      error::PolarsError,
      frame::{DataFrame},
      prelude::{LazyFrame, IntoLazy, col},
  };


  fn main() -> Result<(), PolarsError> {
      let mut df: DataFrame = df!(
            "name" => ["Alice Archer", "Ben Brown", "Chloe Cooper", "Daniel Donovan"],
            "birthdate" => [
                NaiveDate::from_ymd_opt(1997, 1, 10).unwrap(),
                NaiveDate::from_ymd_opt(1985, 2, 15).unwrap(),
                NaiveDate::from_ymd_opt(1997, 3, 22).unwrap(),
                NaiveDate::from_ymd_opt(1997, 4, 30).unwrap(),
            ],
            "weight" => [57.9, 72.5, 54.6, 83.1], // (kg)
            "height" => [1.56, 1.77, 1.65, 1.75], // (m)
        )
        .unwrap();

        let result = df
            .clone()
            .lazy()
            .with_columns([
                col("birthdate").dt().year().alias("birth_year"),
                (col("weight") / col("height").pow(2)).alias("bmi"),
            ])
            .collect()?;
        println!("With added colums:");
        print!("{result}\n");

      Ok(())
  }
With added colums:
shape: (4, 6)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ name           โ”† birthdate  โ”† weight โ”† height โ”† birth_year โ”† bmi       โ”‚
โ”‚ ---            โ”† ---        โ”† ---    โ”† ---    โ”† ---        โ”† ---       โ”‚
โ”‚ str            โ”† date       โ”† f64    โ”† f64    โ”† i32        โ”† f64       โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Alice Archer   โ”† 1997-01-10 โ”† 57.9   โ”† 1.56   โ”† 1997       โ”† 23.791913 โ”‚
โ”‚ Ben Brown      โ”† 1985-02-15 โ”† 72.5   โ”† 1.77   โ”† 1985       โ”† 23.141498 โ”‚
โ”‚ Chloe Cooper   โ”† 1997-03-22 โ”† 54.6   โ”† 1.65   โ”† 1997       โ”† 20.055096 โ”‚
โ”‚ Daniel Donovan โ”† 1997-04-30 โ”† 83.1   โ”† 1.75   โ”† 1997       โ”† 27.134694 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Expression expansion

lit means literal and it is part of the lazy expression API of Polars3's lazy feature.

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "sql"] }
  //! ```

  use chrono::NaiveDate;
  use polars::{
      df,
      error::PolarsError,
      frame::DataFrame,
      prelude::{IntoLazy, col, cols, lit, RoundMode},
  };


  fn main() -> Result<(), PolarsError> {
      let mut df: DataFrame = df!(
            "name" => ["Alice Archer", "Ben Brown", "Chloe Cooper", "Daniel Donovan"],
            "birthdate" => [
                NaiveDate::from_ymd_opt(1997, 1, 10).unwrap(),
                NaiveDate::from_ymd_opt(1985, 2, 15).unwrap(),
                NaiveDate::from_ymd_opt(1997, 3, 22).unwrap(),
                NaiveDate::from_ymd_opt(1997, 4, 30).unwrap(),
            ],
            "weight" => [57.9, 72.5, 54.6, 83.1], // (kg)
            "height" => [1.56, 1.77, 1.65, 1.75], // (m)
        )
        .unwrap();

        let result = df
            .clone()
            .lazy()
            .select([
                col("name"),
                (cols(["weight", "height"]).as_expr() * lit(0.95))
                    .round(2, RoundMode::default())
                    .name()
                    .suffix("-5%"),
            ])
            .collect()?;
        println!("Transform:");
        print!("{result}\n");

      Ok(())
  }
Transform:
shape: (4, 3)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ name           โ”† weight-5% โ”† height-5% โ”‚
โ”‚ ---            โ”† ---       โ”† ---       โ”‚
โ”‚ str            โ”† f64       โ”† f64       โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Alice Archer   โ”† 55.0      โ”† 1.48      โ”‚
โ”‚ Ben Brown      โ”† 68.88     โ”† 1.68      โ”‚
โ”‚ Chloe Cooper   โ”† 51.87     โ”† 1.57      โ”‚
โ”‚ Daniel Donovan โ”† 78.94     โ”† 1.66      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Filtering rows

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "is_between", "sql"] }
  //! ```

  use chrono::NaiveDate;
  use polars::{
      df,
      error::PolarsError,
      frame::{DataFrame},
      prelude::{IntoLazy, col, lit, ClosedInterval},
  };


  fn main() -> Result<(), PolarsError> {
      let mut df: DataFrame = df!(
            "name" => ["Alice Archer", "Ben Brown", "Chloe Cooper", "Daniel Donovan"],
            "birthdate" => [
                NaiveDate::from_ymd_opt(1997, 1, 10).unwrap(),
                NaiveDate::from_ymd_opt(1985, 2, 15).unwrap(),
                NaiveDate::from_ymd_opt(1997, 3, 22).unwrap(),
                NaiveDate::from_ymd_opt(1997, 4, 30).unwrap(),
            ],
            "weight" => [57.9, 72.5, 54.6, 83.1], // (kg)
            "height" => [1.56, 1.77, 1.65, 1.75], // (m)
        )
        .unwrap();

        let result = df
            .clone()
            .lazy()
            .filter(col("birthdate").dt().year().lt(lit(1990)))
            .collect()?;
        println!("With row filtering:");
        print!("{result}\n");

        let result = df
              .clone()
              .lazy()
              .filter(
                  col("birthdate")
                      .is_between(
                          lit(NaiveDate::from_ymd_opt(1982, 12, 31).unwrap()),
                          lit(NaiveDate::from_ymd_opt(1996, 1, 1).unwrap()),
                          ClosedInterval::Both,
                      )
                      .and(col("height").gt(lit(1.7))),
              )
              .collect()?;
        println!("With complex row filtering:");
        print!("{result}\n");

      Ok(())
  }
With row filtering:
shape: (1, 4)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ name      โ”† birthdate  โ”† weight โ”† height โ”‚
โ”‚ ---       โ”† ---        โ”† ---    โ”† ---    โ”‚
โ”‚ str       โ”† date       โ”† f64    โ”† f64    โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Ben Brown โ”† 1985-02-15 โ”† 72.5   โ”† 1.77   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
With complex row filtering:
shape: (1, 4)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ name      โ”† birthdate  โ”† weight โ”† height โ”‚
โ”‚ ---       โ”† ---        โ”† ---    โ”† ---    โ”‚
โ”‚ str       โ”† date       โ”† f64    โ”† f64    โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Ben Brown โ”† 1985-02-15 โ”† 72.5   โ”† 1.77   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Grouping by

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "sql"] }
  //! ```

  use chrono::NaiveDate;
  use polars::{
      df,
      error::PolarsError,
      frame::DataFrame,
      prelude::{IntoLazy, col, lit, len, RoundMode},
  };


  fn main() -> Result<(), PolarsError> {
      let mut df: DataFrame = df!(
            "name" => ["Alice Archer", "Ben Brown", "Chloe Cooper", "Daniel Donovan"],
            "birthdate" => [
                NaiveDate::from_ymd_opt(1997, 1, 10).unwrap(),
                NaiveDate::from_ymd_opt(1985, 2, 15).unwrap(),
                NaiveDate::from_ymd_opt(1997, 3, 22).unwrap(),
                NaiveDate::from_ymd_opt(1997, 4, 30).unwrap(),
            ],
            "weight" => [57.9, 72.5, 54.6, 83.1], // (kg)
            "height" => [1.56, 1.77, 1.65, 1.75], // (m)
        )
        .unwrap();

        let result = df
            .clone()
            .lazy()
            .group_by([(col("birthdate").dt().year() / lit(10) * lit(10)).alias("decade")])
            .agg([len()])
            .collect()?;
        println!("Grouping by birth decade:");
        print!("{result}\n");

        let result = df
            .clone()
            .lazy()
            .group_by([(col("birthdate").dt().year() / lit(10) * lit(10)).alias("decade")])
            .agg([
                len().alias("sample_size"),
                col("weight")
                    .mean()
                    .round(2, RoundMode::default())
                    .alias("avg_weight"),
                col("height").max().alias("tallest"),
            ])
            .collect()?;
        println!("Grouping by derived features:");
        println!("{result}");

      Ok(())
  }
Grouping by birth decade:
shape: (2, 2)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”
โ”‚ decade โ”† len โ”‚
โ”‚ ---    โ”† --- โ”‚
โ”‚ i32    โ”† u32 โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•ก
โ”‚ 1990   โ”† 3   โ”‚
โ”‚ 1980   โ”† 1   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜
Grouping by derived features:
shape: (2, 4)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ decade โ”† sample_size โ”† avg_weight โ”† tallest โ”‚
โ”‚ ---    โ”† ---         โ”† ---        โ”† ---     โ”‚
โ”‚ i32    โ”† u32         โ”† f64        โ”† f64     โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ 1980   โ”† 1           โ”† 72.5       โ”† 1.77    โ”‚
โ”‚ 1990   โ”† 3           โ”† 65.2       โ”† 1.75    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Analysis

When we receive a new dataset, the goal is not to immediately build charts or run models. The first goal is to understand whether the data can be trusted.

Inspect the raw data:

Download the data, load it with Polars3 and then print the head

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "sql", "csv"] }
  //! ```

  use polars::{
      error::PolarsError,
      prelude::{CsvParseOptions, CsvReadOptions, SerReader},
  };

  fn main() -> Result<(), PolarsError> {
      let df_csv = CsvReadOptions::default()
          .with_has_header(true)
          .with_parse_options(CsvParseOptions::default().with_try_parse_dates(true))
          .try_into_reader_with_file_path(Some(
              "data/Water Quality Monitoring Dataset_ Ireland.csv".into(),
          ))?
          .finish()?;
      println!("{df_csv}");
      Ok(())
  }
rust-script failed with exit code 1

[stderr]
Error: ComputeError(ErrString("could not parse `50.5` as dtype `i64` at column 'Alkalinity-total (as CaCO3)' (column number 4)\n\nThe current offset in the file is 7606 bytes.\n\nYou might want to try:\n- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),\n- specifying correct dtype with the `schema_overrides` argument\n- setting `ignore_errors` to `True`,\n- adding `50.5` to the `null_values` list.\n\nOriginal error: ```invalid primitive value found during CSV parsing```"))

Polars3 is not guessing the type of some of the columns correctly. Let's allow it to guess from 100 rows by default.

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "sql", "csv"] }
  //! ```

  use polars::{
      error::PolarsError,
      prelude::{CsvParseOptions, CsvReadOptions, SerReader},
  };

  fn main() -> Result<(), PolarsError> {
      let df_csv = CsvReadOptions::default()
          .with_has_header(true)
          .with_infer_schema_length(None)
          .with_parse_options(CsvParseOptions::default().with_try_parse_dates(true))
          .try_into_reader_with_file_path(Some(
              "data/Water Quality Monitoring Dataset_ Ireland.csv".into(),
          ))?
          .finish()?;
      println!("{df_csv}");
      Ok(())
  }
shape: (29_159, 14)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ WaterbodyNam โ”† Years โ”† SampleDate โ”† Alkalinity-t โ”† โ€ฆ โ”† pH   โ”† Temperature โ”† Total       โ”† True   โ”‚
โ”‚ e            โ”† ---   โ”† ---        โ”† otal (as     โ”†   โ”† ---  โ”† ---         โ”† Hardness    โ”† Colour โ”‚
โ”‚ ---          โ”† i64   โ”† str        โ”† CaCO3)       โ”†   โ”† f64  โ”† f64         โ”† (as CaCO3)  โ”† ---    โ”‚
โ”‚ str          โ”†       โ”†            โ”† ---          โ”†   โ”†      โ”†             โ”† ---         โ”† f64    โ”‚
โ”‚              โ”†       โ”†            โ”† f64          โ”†   โ”†      โ”†             โ”† f64         โ”†        โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ ABBEYTOWN_01 โ”† 2023  โ”† Feb        โ”† 314.0        โ”† โ€ฆ โ”† 7.8  โ”† 10.4        โ”† 370.0       โ”† 24.0   โ”‚
โ”‚ 0            โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ”‚ Allua        โ”† 2007  โ”† Aug        โ”† 14.0         โ”† โ€ฆ โ”† 7.42 โ”† 17.8        โ”† 13.4        โ”† 35.0   โ”‚
โ”‚ Allua        โ”† 2007  โ”† Aug        โ”† 17.0         โ”† โ€ฆ โ”† 7.67 โ”† 18.1        โ”† 15.8        โ”† 29.0   โ”‚
โ”‚ Allua        โ”† 2007  โ”† Aug        โ”† 18.0         โ”† โ€ฆ โ”† 7.63 โ”† 17.8        โ”† 15.9        โ”† 31.0   โ”‚
โ”‚ Allua        โ”† 2007  โ”† Sep        โ”† 19.0         โ”† โ€ฆ โ”† 7.33 โ”† 20.1        โ”† 15.4        โ”† 23.0   โ”‚
โ”‚ โ€ฆ            โ”† โ€ฆ     โ”† โ€ฆ          โ”† โ€ฆ            โ”† โ€ฆ โ”† โ€ฆ    โ”† โ€ฆ           โ”† โ€ฆ           โ”† โ€ฆ      โ”‚
โ”‚ SULLANE_060  โ”† 2022  โ”† Sep        โ”† 31.0         โ”† โ€ฆ โ”† 7.1  โ”† 14.9        โ”† 45.0        โ”† 27.0   โ”‚
โ”‚ SULLANE_060  โ”† 2022  โ”† Nov        โ”† 22.0         โ”† โ€ฆ โ”† 6.9  โ”† 12.3        โ”† 34.0        โ”† 58.0   โ”‚
โ”‚ SULLANE_060  โ”† 2023  โ”† Mar        โ”† 36.0         โ”† โ€ฆ โ”† 7.2  โ”† 7.1         โ”† 44.0        โ”† 20.0   โ”‚
โ”‚ TWO POT      โ”† 2023  โ”† Feb        โ”† 81.0         โ”† โ€ฆ โ”† 7.4  โ”† 8.6         โ”† 120.0       โ”† 9.0    โ”‚
โ”‚ (Cork        โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ”‚ City)_010    โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ”‚ TWO POT      โ”† 2023  โ”† Feb        โ”† 82.0         โ”† โ€ฆ โ”† 7.8  โ”† 8.1         โ”† 121.0       โ”† 5.0    โ”‚
โ”‚ (Cork        โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ”‚ City)_010    โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Let us have Polars4 infer the proper types of the columns now from 10000 rows

  //! ```cargo
  //! [dependencies]
  //! chrono = "0.4.45"
  //! polars = { version = "0.54.4", features = ["lazy", "temporal", "sql", "csv"] }
  //! ```

  use polars::{
      error::PolarsError,
      prelude::{CsvParseOptions, CsvReadOptions, SerReader},
  };

  fn main() -> Result<(), PolarsError> {
      let df_csv = CsvReadOptions::default()
          .with_has_header(true)
          .with_infer_schema_length(Some(10_000))
          .with_parse_options(CsvParseOptions::default().with_try_parse_dates(true))
          .try_into_reader_with_file_path(Some(
              "data/Water Quality Monitoring Dataset_ Ireland.csv".into(),
          ))?
          .finish()?;
      println!("{df_csv}");
      Ok(())
  }
shape: (29_159, 14)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ WaterbodyNam โ”† Years โ”† SampleDate โ”† Alkalinity-t โ”† โ€ฆ โ”† pH   โ”† Temperature โ”† Total       โ”† True   โ”‚
โ”‚ e            โ”† ---   โ”† ---        โ”† otal (as     โ”†   โ”† ---  โ”† ---         โ”† Hardness    โ”† Colour โ”‚
โ”‚ ---          โ”† i64   โ”† str        โ”† CaCO3)       โ”†   โ”† f64  โ”† f64         โ”† (as CaCO3)  โ”† ---    โ”‚
โ”‚ str          โ”†       โ”†            โ”† ---          โ”†   โ”†      โ”†             โ”† ---         โ”† f64    โ”‚
โ”‚              โ”†       โ”†            โ”† f64          โ”†   โ”†      โ”†             โ”† f64         โ”†        โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ ABBEYTOWN_01 โ”† 2023  โ”† Feb        โ”† 314.0        โ”† โ€ฆ โ”† 7.8  โ”† 10.4        โ”† 370.0       โ”† 24.0   โ”‚
โ”‚ 0            โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ”‚ Allua        โ”† 2007  โ”† Aug        โ”† 14.0         โ”† โ€ฆ โ”† 7.42 โ”† 17.8        โ”† 13.4        โ”† 35.0   โ”‚
โ”‚ Allua        โ”† 2007  โ”† Aug        โ”† 17.0         โ”† โ€ฆ โ”† 7.67 โ”† 18.1        โ”† 15.8        โ”† 29.0   โ”‚
โ”‚ Allua        โ”† 2007  โ”† Aug        โ”† 18.0         โ”† โ€ฆ โ”† 7.63 โ”† 17.8        โ”† 15.9        โ”† 31.0   โ”‚
โ”‚ Allua        โ”† 2007  โ”† Sep        โ”† 19.0         โ”† โ€ฆ โ”† 7.33 โ”† 20.1        โ”† 15.4        โ”† 23.0   โ”‚
โ”‚ โ€ฆ            โ”† โ€ฆ     โ”† โ€ฆ          โ”† โ€ฆ            โ”† โ€ฆ โ”† โ€ฆ    โ”† โ€ฆ           โ”† โ€ฆ           โ”† โ€ฆ      โ”‚
โ”‚ SULLANE_060  โ”† 2022  โ”† Sep        โ”† 31.0         โ”† โ€ฆ โ”† 7.1  โ”† 14.9        โ”† 45.0        โ”† 27.0   โ”‚
โ”‚ SULLANE_060  โ”† 2022  โ”† Nov        โ”† 22.0         โ”† โ€ฆ โ”† 6.9  โ”† 12.3        โ”† 34.0        โ”† 58.0   โ”‚
โ”‚ SULLANE_060  โ”† 2023  โ”† Mar        โ”† 36.0         โ”† โ€ฆ โ”† 7.2  โ”† 7.1         โ”† 44.0        โ”† 20.0   โ”‚
โ”‚ TWO POT      โ”† 2023  โ”† Feb        โ”† 81.0         โ”† โ€ฆ โ”† 7.4  โ”† 8.6         โ”† 120.0       โ”† 9.0    โ”‚
โ”‚ (Cork        โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ”‚ City)_010    โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ”‚ TWO POT      โ”† 2023  โ”† Feb        โ”† 82.0         โ”† โ€ฆ โ”† 7.8  โ”† 8.1         โ”† 121.0       โ”† 5.0    โ”‚
โ”‚ (Cork        โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ”‚ City)_010    โ”†       โ”†            โ”†              โ”†   โ”†      โ”†             โ”†             โ”†        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
// imports go here

fn main() -> PolarsResult<()> {
    let df = CsvReadOptions::default()
        .with_has_header(true)
        // Discovery step: scan the file because we do not know columns yet.
        .with_infer_schema_length(Some(10_000))
        .with_parse_options(CsvParseOptions::default().with_try_parse_dates(true))
        .try_into_reader_with_file_path(Some(
            "data/Water Quality Monitoring Dataset_ Ireland.csv".into(),
        ))?
        .finish()?;

    inspect_raw_data(df.clone())?;

    Ok(())
}
rows: 29159
columns: 14

columns and types:
WaterbodyName: String (text or mixed)
Years: Int64 (number)
SampleDate: String (text or mixed)
Alkalinity-total (as CaCO3): Float64 (number)
Ammonia-Total (as N): Float64 (number)
BOD - 5 days (Total): Float64 (number)
Chloride: Float64 (number)
Conductivity @25ยฐC: Float64 (number)
Dissolved Oxygen: Float64 (number)
ortho-Phosphate (as P) - unspecified: Float64 (number)
pH: Float64 (number)
Temperature: Float64 (number)
Total Hardness (as CaCO3): Float64 (number)
True Colour: Float64 (number)

one raw row:
shape: (1, 14)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ WaterbodyName โ”† Years โ”† SampleDate โ”† Alkalinity-total    โ”† โ€ฆ โ”† pH  โ”† Temperature โ”† Total Hardness (as โ”† True Colour โ”‚
โ”‚ ---           โ”† ---   โ”† ---        โ”† (as CaCO3)          โ”†   โ”† --- โ”† ---         โ”† CaCO3)             โ”† ---         โ”‚
โ”‚ str           โ”† i64   โ”† str        โ”† ---                 โ”†   โ”† f64 โ”† f64         โ”† ---                โ”† f64         โ”‚
โ”‚               โ”†       โ”†            โ”† f64                 โ”†   โ”†     โ”†             โ”† f64                โ”†             โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ ABBEYTOWN_010 โ”† 2023  โ”† Feb        โ”† 314.0               โ”† โ€ฆ โ”† 7.8 โ”† 10.4        โ”† 370.0              โ”† 24.0        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

location/date columns: ["WaterbodyName", "Years", "SampleDate"]
measurement columns: ["Alkalinity-total (as CaCO3)", "Ammonia-Total (as N)", "BOD - 5 days (Total)", "Chloride", "Conductivity @25ยฐC", "Dissolved Oxygen", "ortho-Phosphate (as P) - unspecified", "pH", "Temperature", "Total Hardness (as CaCO3)", "True Colour"]

long water-quality shape:
shape: (10, 7)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ WaterbodyName โ”† Years โ”† SampleDate โ”† source_column               โ”† measurement_value โ”† parameter        โ”† unit     โ”‚
โ”‚ ---           โ”† ---   โ”† ---        โ”† ---                         โ”† ---               โ”† ---              โ”† ---      โ”‚
โ”‚ str           โ”† i64   โ”† str        โ”† str                         โ”† f64               โ”† str              โ”† str      โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ ABBEYTOWN_010 โ”† 2023  โ”† Feb        โ”† Alkalinity-total (as CaCO3) โ”† 314.0             โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2007  โ”† Aug        โ”† Alkalinity-total (as CaCO3) โ”† 14.0              โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2007  โ”† Aug        โ”† Alkalinity-total (as CaCO3) โ”† 17.0              โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2007  โ”† Aug        โ”† Alkalinity-total (as CaCO3) โ”† 18.0              โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2007  โ”† Sep        โ”† Alkalinity-total (as CaCO3) โ”† 19.0              โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2007  โ”† Sep        โ”† Alkalinity-total (as CaCO3) โ”† 19.0              โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2007  โ”† Sep        โ”† Alkalinity-total (as CaCO3) โ”† 18.0              โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2008  โ”† Jan        โ”† Alkalinity-total (as CaCO3) โ”† 8.0               โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2008  โ”† Jan        โ”† Alkalinity-total (as CaCO3) โ”† 9.0               โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ”‚ Allua         โ”† 2008  โ”† Jan        โ”† Alkalinity-total (as CaCO3) โ”† 10.0              โ”† Alkalinity-total โ”† as CaCO3 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Profile the data

Footnotes