Ubuntu TechHive
discover-clickhouse-a-highperformance-columnoriented-dbms-for-olap-and-realtime-analytics.md
Discover Clickhouse: A high-performance Column-Oriented DBMS for OLAP and Real-Time Analytics
article.detail

Discover Clickhouse: A high-performance Column-Oriented DBMS for OLAP and Real-Time Analytics

reading.progress 15 min read

Description of Discover Clickhouse

Discover Clickhouse: high-performance Column-Oriented DBMS for OLAP(Online Analytic Processing)

Introduction to Databases

What is a Database?

A database is a system used to store, organize, and access information. It allows users to input, manage, and retrieve data efficiently and securely. Databases are structured to handle large amounts of information by allowing data to be stored in tables that can be searched, manipulated, and maintained easily.

FR
Une base de donnĂ©es est un systĂšme utilisĂ© pour stocker, organiser et accĂ©der Ă  des informations. Elle permet aux utilisateurs de saisir, gĂ©rer et rĂ©cupĂ©rer des donnĂ©es de maniĂšre efficace et sĂ©curisĂ©e. Les bases de donnĂ©es sont structurĂ©es pour gĂ©rer de grandes quantitĂ©s d'informations en permettant que les donnĂ©es soient stockĂ©es dans des tables qui peuvent ĂȘtre recherchĂ©es, manipulĂ©es et entretenues facilement.

Types of Databases

Relational Store data in structured tables with rows and columns, allowing complex queries using SQL. Example: HR, and CRM database

non-relational Handle unstructured and semi-structured data without a fixed table structure, using formats like key-value, documents, and graphs. Example: E-commerce platforms, content management, IoT and real-time analytics

FR
Relationnelle Stocke les donnĂ©es dans des tables structurĂ©es avec des lignes et des colonnes, permettant des requĂȘtes complexes en utilisant SQL. Exemple : bases de donnĂ©es RH et CRM.

Non relationnelle GÚre des données non structurées et semi-structurées sans structure de table fixe, en utilisant des formats tels que clé-valeur, documents et graphes. Exemple : plateformes de commerce électronique, gestion de contenu, IoT et analyses en temps réel.

Introduction to Database Management Systems (DBMS)

Software that allows users to define, create, maintain, and control access to the database. It manages data storage, manipulation, and retrieval, facilitating multiple user interactions with various databases

FR
Logiciel permettant aux utilisateurs de définir, créer, maintenir et contrÎler l'accÚs à la base de données. Il gÚre le stockage, la manipulation et la récupération des données, facilitant les interactions de plusieurs utilisateurs avec diverses bases de données.

Fundamentals of Column-Oriented Databases

Row-Oriented vs. Column-Oriented Storage

Row-Oriented Databases:

  • Structure: Stores data by rows, making it efficient for writing operations and transactional processing where access to complete records is common.
  • Benefits: Optimal for OLTP systems (Online Transaction Processing) like customer relationship management where fast updates and inserts are crucial.

Column-Oriented Databases:

  • Structure: Stores data by columns, enhancing read operations and data analytics where aggregations over large datasets are required.
  • Benefits: Ideal for OLAP systems (Online Analytical Processing) such as data warehousing and big data analytics, offering high performance for queries accessing only a subset of columns.

Advantages:

  • Row-Oriented: Quick data retrieval by complete row, simpler implementation, efficient for entry and updating of records.
  • Column-Oriented: Better compression, efficient storage leading to faster read of column data, optimized for querying large datasets, which reduces I/O operations.

FR
Bases de données orientées ligne :

  • Structure : Stocke les donnĂ©es par lignes, ce qui est efficace pour les opĂ©rations d'Ă©criture et le traitement transactionnel oĂč l'accĂšs aux enregistrements complets est courant.
  • Avantages : Optimal pour les systĂšmes OLTP (Traitement des Transactions en Ligne) comme la gestion de la relation client oĂč des mises Ă  jour et des insertions rapides sont cruciales.

Bases de données orientées colonne :

  • Structure : Stocke les donnĂ©es par colonnes, amĂ©liorant les opĂ©rations de lecture et l'analyse de donnĂ©es oĂč des agrĂ©gations sur de grands ensembles de donnĂ©es sont nĂ©cessaires.
  • Avantages : IdĂ©al pour les systĂšmes OLAP (Traitement Analytique en Ligne) tels que l'entreposage de donnĂ©es et l'analyse de grandes donnĂ©es, offrant de hautes performances pour les requĂȘtes accĂ©dant uniquement Ă  un sous-ensemble de colonnes.

Avantages :

  • OrientĂ© ligne : RĂ©cupĂ©ration rapide des donnĂ©es par ligne complĂšte, mise en Ɠuvre plus simple, efficace pour l'entrĂ©e et la mise Ă  jour des enregistrements.
  • OrientĂ© colonne : Meilleure compression, stockage efficace conduisant Ă  une lecture rapide des donnĂ©es de colonne, optimisĂ© pour l'interrogation de grands ensembles de donnĂ©es, ce qui rĂ©duit les opĂ©rations d'entrĂ©e/sortie.

Column-Oriented Database

graph TD
    subgraph "Column-Oriented Database"
        subgraph "ID Column"
            ID1[1]
            ID2[2]
            ID3[3]
        end
        subgraph "Name Column"
            Name1[John]
            Name2[Alice]
            Name3[Bob]
        end
        subgraph "Age Column"
            Age1[30]
            Age2[25]
            Age3[35]
        end
        subgraph "City Column"
            City1[New York]
            City2[Los Angeles]
            City3[San Francisco]
        end
    end

Row Oriented Database

graph TD
    subgraph "Row-Oriented Storage"
        A["Row 1: ID | Name | Age | City"]
        B["Row 2: ID | Name | Age | City"]
        C["Row 3: ID | Name | Age | City"]
    end

Introduction to ClickHouse

What is ClickHouse?

ClickHouse is an open-source column-oriented database management system (DBMS) designed for online analytical processing (OLAP). It allows users to generate analytical reports using SQL queries in real-time, offering high performance and scalability for processing large volumes of data. Key features include:

  • Column-oriented storage
  • Data compression
  • Distributed query processing
  • SQL support
  • High performance for analytical queries
  • Ability to handle petabytes of data
  • Fault tolerance through replication

ClickHouse is particularly well-suited for applications requiring fast analytics on massive datasets, such as business intelligence, observability, and real-time data warehousing.

FR
ClickHouse est un systĂšme de gestion de base de donnĂ©es (SGBD) orientĂ© colonne et open-source, conçu pour le traitement analytique en ligne (OLAP). Il permet aux utilisateurs de gĂ©nĂ©rer des rapports analytiques en temps rĂ©el Ă  l'aide de requĂȘtes SQL, offrant des performances Ă©levĂ©es et une scalabilitĂ© pour le traitement de grands volumes de donnĂ©es. Les caractĂ©ristiques clĂ©s incluent :

  • Stockage orientĂ© colonne
  • Compression des donnĂ©es
  • Traitement distribuĂ© des requĂȘtes
  • Support SQL
  • Haute performance pour les requĂȘtes analytiques
  • CapacitĂ© Ă  gĂ©rer des pĂ©taoctets de donnĂ©es
  • TolĂ©rance aux pannes grĂące Ă  la rĂ©plication

ClickHouse est particuliÚrement adapté pour les applications nécessitant des analyses rapides sur des ensembles de données massifs, comme l'intelligence d'affaires, l'observabilité et l'entreposage de données en temps réel.

Key Features of ClickHouse & Speed

  • Column-Oriented Storage: Often, only a few of the hundreds of columns in source data are needed for a report. Column-oriented storage improves efficiency by only accessing the necessary columns, reducing costly disk reads.
  • Indexes: ClickHouse utilizes in-memory data structures to read only the required columns and specific row ranges, optimizing data retrieval.
  • Data Compression: Grouping the same column's values enhances compression compared to row-oriented systems, as similar values are common in adjacent rows. ClickHouse also employs specialized codecs for even greater data compactness.
  • Vectorized Query Execution: ClickHouse processes data by columns, improving CPU cache efficiency and enabling the use of SIMD CPU instructions for faster operations.
  • Scalability: ClickHouse maximizes resource utilization by leveraging all available CPU cores and disks, not just on a single server but across a cluster, enhancing performance for complex queries. FR \\
  • Stockage orientĂ© colonne : Souvent, seules quelques-unes des centaines de colonnes des donnĂ©es sources sont nĂ©cessaires pour un rapport. Le stockage orientĂ© colonne amĂ©liore l'efficacitĂ© en accĂ©dant uniquement aux colonnes nĂ©cessaires, rĂ©duisant ainsi les coĂ»teuses lectures de disque.
  • Index* : ClickHouse utilise des structures de donnĂ©es en mĂ©moire pour lire uniquement les colonnes requises et les plages de lignes spĂ©cifiques, optimisant ainsi la rĂ©cupĂ©ration des donnĂ©es.
  • Compression des donnĂ©es : Regrouper les valeurs d'une mĂȘme colonne amĂ©liore la compression par rapport aux systĂšmes orientĂ©s ligne, car les valeurs similaires sont courantes dans les lignes adjacentes. ClickHouse utilise Ă©galement des codecs spĂ©cialisĂ©s pour une compacitĂ© encore plus grande des donnĂ©es.
  • ExĂ©cution de requĂȘtes vectorisĂ©es : ClickHouse traite les donnĂ©es par colonnes, amĂ©liorant l'efficacitĂ© du cache CPU et permettant l'utilisation d'instructions CPU SIMD pour des opĂ©rations plus rapides.
  • ScalabilitĂ© : ClickHouse maximise l'utilisation des ressources en exploitant tous les cƓurs de CPU et les disques disponibles, non seulement sur un seul serveur mais sur tout un cluster, amĂ©liorant ainsi les performances pour les requĂȘtes complexes.
Primary Key, Index & granule

Primary Key:

  • Determines the physical sorting order of data on disk
  • Used to create a sparse index

Index:

  • The index is created for every N rows (default 8192), called a granule
  • Stored in the primary.idx file
  • Used for quickly locating relevant data blocks

Granules:

  • A granule is a logical group of rows (default 8192 rows)
  • Data is split into multiple parts, and each part is divided into granules

How they work together:

  • When a query is executed, ClickHouse uses the primary key to search the sparse index
  • The index helps identify which granules potentially contain relevant data
  • ClickHouse then loads only the matching granules into memory
  • Data within the granules is processed to find the exact matching rows

Architecture of ClickHouse

Overview of ClickHouse Architecture

  • Engine ClickHouse engine is a type of table engine that determines how and where data is stored, how queries are executed, and the concurrency and replication capabilities of the database. Each engine is optimized for specific use cases, providing various functionalities such as data partitioning, indexing, and support for different query types. FR \\
  • Moteur : Le moteur ClickHouse est un type de moteur de table qui dĂ©termine comment et oĂč les donnĂ©es sont stockĂ©es, comment les requĂȘtes sont exĂ©cutĂ©es, ainsi que les capacitĂ©s de concurrence et de rĂ©plication de la base de donnĂ©es. Chaque moteur est optimisĂ© pour des cas d'utilisation spĂ©cifiques, offrant diverses fonctionnalitĂ©s telles que le partitionnement des donnĂ©es, l'indexation et le support pour diffĂ©rents types de requĂȘtes.

Clickhouse Engines

MergeTree Family:

  • MergeTree: The most versatile and commonly used engine, designed for high-performance read and write operations, supporting data partitioning, primary keys, and data replication.
  • ReplacingMergeTree: Allows for deduplication of rows during merges based on the sorting key.
  • SummingMergeTree: Sums up numeric columns for rows with the same sorting key during merges.
  • AggregatingMergeTree: Aggregates data during merges using specified aggregate functions.
  • CollapsingMergeTree: Collapses rows with the same sorting key during merges, based on a special 'Sign' column.
  • VersionedCollapsingMergeTree: Extends CollapsingMergeTree with version control for rows.
  • GraphiteMergeTree: Specialized for storing time series data.

Integration Engines:

  • ODBC: Allows querying external databases using ODBC.
  • JDBC: Enables querying external databases using JDBC.
  • MySQL: For querying MySQL databases.
  • MongoDB: Allows querying MongoDB databases.
  • HDFS: For working with data stored in Hadoop Distributed File System.
  • S3: Enables working with data stored in Amazon S3.
  • Kafka: For working with Apache Kafka streams.

FR \\

  • Famille MergeTree :
  • MergeTree : Le moteur le plus polyvalent et le plus utilisĂ©, conçu pour des opĂ©rations de lecture et d'Ă©criture Ă  haute performance, prenant en charge le partitionnement des donnĂ©es, les clĂ©s primaires et la rĂ©plication des donnĂ©es.
  • ReplacingMergeTree : Permet la dĂ©duplication des lignes pendant les fusions basĂ©es sur la clĂ© de tri.
  • SummingMergeTree : Somme les colonnes numĂ©riques pour les lignes ayant la mĂȘme clĂ© de tri pendant les fusions.
  • AggregatingMergeTree : AgrĂšge les donnĂ©es pendant les fusions en utilisant des fonctions d'agrĂ©gation spĂ©cifiĂ©es.
  • CollapsingMergeTree : Effondre les lignes ayant la mĂȘme clĂ© de tri pendant les fusions, en fonction d'une colonne spĂ©ciale 'Signe'.
  • VersionedCollapsingMergeTree : Étend CollapsingMergeTree avec un contrĂŽle de version pour les lignes.
  • GraphiteMergeTree : SpĂ©cialisĂ© pour le stockage des donnĂ©es de sĂ©ries temporelles.

Getting Started with ClickHouse

Installation and Setup - on Mac or Linux

  1. Install library
curl https://clickhouse.com/ | sh
  1. Run the following command to start clickhouse server:
./clickhouse server
  1. Open a new Terminal and connect to the server
clickhouse-client
  1. Query a remote files
--Check the file
SELECT * FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/ontime/csv_by_year/*.csv.gz', CSVWithNames) limit 10;

--Basic Count
SELECT count() FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/ontime/csv_by_year/*.csv.gz', CSVWithNames) limit 10;

--The number of delays by carrier for 2007
SELECT IATA_CODE_Reporting_Airline AS Carrier, count(*)
FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/ontime/csv_by_year/*.csv.gz')
WHERE DepDelay>10 AND Year=2007
GROUP BY Carrier
ORDER BY count(*) DESC limit 5;

Results
┌─Carrier─┬─count()─┐

  1. │ WN │ 296293 │ └─────────┮─────────┘ ┌─Carrier─┬─count()─┐
  2. │ AA │ 176203 │
  3. │ MQ │ 145630 │
  4. │ US │ 135987 │
  5. │ UA │ 128174 │ └─────────┮─────────┘

5 rows in set. Elapsed: 420.586 sec. Processed 204.84 million rows, 11.66 GB (487.03 thousand rows/s., 27.72 MB/s.) Peak memory usage: 1.18 GiB.

Running Your First Query

Create table
CREATE TABLE `ontime`
(
    `Year`                            UInt16,
    `Quarter`                         UInt8,
    `Month`                           UInt8,
    `DayofMonth`                      UInt8,
    `DayOfWeek`                       UInt8,
    `FlightDate`                      Date,
    `Reporting_Airline`               LowCardinality(String),
    `DOT_ID_Reporting_Airline`        Int32,
    `IATA_CODE_Reporting_Airline`     LowCardinality(String),
    `Tail_Number`                     LowCardinality(String),
    `Flight_Number_Reporting_Airline` LowCardinality(String),
    `OriginAirportID`                 Int32,
    `OriginAirportSeqID`              Int32,
    `OriginCityMarketID`              Int32,
    `Origin`                          FixedString(5),
    `OriginCityName`                  LowCardinality(String),
    `OriginState`                     FixedString(2),
    `OriginStateFips`                 FixedString(2),
    `OriginStateName`                 LowCardinality(String),
    `OriginWac`                       Int32,
    `DestAirportID`                   Int32,
    `DestAirportSeqID`                Int32,
    `DestCityMarketID`                Int32,
    `Dest`                            FixedString(5),
    `DestCityName`                    LowCardinality(String),
    `DestState`                       FixedString(2),
    `DestStateFips`                   FixedString(2),
    `DestStateName`                   LowCardinality(String),
    `DestWac`                         Int32,
    `CRSDepTime`                      Int32,
    `DepTime`                         Int32,
    `DepDelay`                        Int32,
    `DepDelayMinutes`                 Int32,
    `DepDel15`                        Int32,
    `DepartureDelayGroups`            LowCardinality(String),
    `DepTimeBlk`                      LowCardinality(String),
    `TaxiOut`                         Int32,
    `WheelsOff`                       LowCardinality(String),
    `WheelsOn`                        LowCardinality(String),
    `TaxiIn`                          Int32,
    `CRSArrTime`                      Int32,
    `ArrTime`                         Int32,
    `ArrDelay`                        Int32,
    `ArrDelayMinutes`                 Int32,
    `ArrDel15`                        Int32,
    `ArrivalDelayGroups`              LowCardinality(String),
    `ArrTimeBlk`                      LowCardinality(String),
    `Cancelled`                       Int8,
    `CancellationCode`                FixedString(1),
    `Diverted`                        Int8,
    `CRSElapsedTime`                  Int32,
    `ActualElapsedTime`               Int32,
    `AirTime`                         Int32,
    `Flights`                         Int32,
    `Distance`                        Int32,
    `DistanceGroup`                   Int8,
    `CarrierDelay`                    Int32,
    `WeatherDelay`                    Int32,
    `NASDelay`                        Int32,
    `SecurityDelay`                   Int32,
    `LateAircraftDelay`               Int32,
    `FirstDepTime`                    Int16,
    `TotalAddGTime`                   Int16,
    `LongestAddGTime`                 Int16,
    `DivAirportLandings`              Int8,
    `DivReachedDest`                  Int8,
    `DivActualElapsedTime`            Int16,
    `DivArrDelay`                     Int16,
    `DivDistance`                     Int16,
    `Div1Airport`                     LowCardinality(String),
    `Div1AirportID`                   Int32,
    `Div1AirportSeqID`                Int32,
    `Div1WheelsOn`                    Int16,
    `Div1TotalGTime`                  Int16,
    `Div1LongestGTime`                Int16,
    `Div1WheelsOff`                   Int16,
    `Div1TailNum`                     LowCardinality(String),
    `Div2Airport`                     LowCardinality(String),
    `Div2AirportID`                   Int32,
    `Div2AirportSeqID`                Int32,
    `Div2WheelsOn`                    Int16,
    `Div2TotalGTime`                  Int16,
    `Div2LongestGTime`                Int16,
    `Div2WheelsOff`                   Int16,
    `Div2TailNum`                     LowCardinality(String),
    `Div3Airport`                     LowCardinality(String),
    `Div3AirportID`                   Int32,
    `Div3AirportSeqID`                Int32,
    `Div3WheelsOn`                    Int16,
    `Div3TotalGTime`                  Int16,
    `Div3LongestGTime`                Int16,
    `Div3WheelsOff`                   Int16,
    `Div3TailNum`                     LowCardinality(String),
    `Div4Airport`                     LowCardinality(String),
    `Div4AirportID`                   Int32,
    `Div4AirportSeqID`                Int32,
    `Div4WheelsOn`                    Int16,
    `Div4TotalGTime`                  Int16,
    `Div4LongestGTime`                Int16,
    `Div4WheelsOff`                   Int16,
    `Div4TailNum`                     LowCardinality(String),
    `Div5Airport`                     LowCardinality(String),
    `Div5AirportID`                   Int32,
    `Div5AirportSeqID`                Int32,
    `Div5WheelsOn`                    Int16,
    `Div5TotalGTime`                  Int16,
    `Div5LongestGTime`                Int16,
    `Div5WheelsOff`                   Int16,
    `Div5TailNum`                     LowCardinality(String)
) ENGINE = MergeTree
  ORDER BY (Year, Quarter, Month, DayofMonth, FlightDate, IATA_CODE_Reporting_Airline);
Load the data - loaded 1M records and 2002 only
INSERT INTO ontime SELECT *
FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/ontime/csv_by_year/*2001*.csv.gz', CSVWithNames)
WHERE Year=2001 limit 1000000
SETTINGS max_insert_threads = 40;
  • Number of delys by airport
SELECT Origin, count(*) AS c
FROM ontime
WHERE DepDelay>10
GROUP BY Origin
ORDER BY c DESC
LIMIT 10;
Query Local data - File is local

Using dataset NY Complaints.

NYPD COMPLAINTS

describe table file("/Users/[you]/user_files/nypd_complaint.tsv")

select * from file("/Users/[you]/user_files/nypd_complaint.tsv") limit 10
Quick analysis
  • Count of Complaints by Crime type
SELECT IFNULL(NYPD_Complaint_Data_Current_Year_To_Date.`OFNS_DESC`, 'Unknown') AS Crime_Type,
    COUNT(*) AS Complaint_Count
FROM url('https://data.cityofnewyork.us/api/views/5uac-w243/rows.csv?accessType=DOWNLOAD', CSVWithNames)
    AS NYPD_Complaint_Data_Current_Year_To_Date
GROUP BY Crime_Type ORDER BY Complaint_Count DESC limit 10
  • Most Common Crimes at Night
SELECT IFNULL(NYPD_Complaint.`OFNS_DESC`, 'Unknown') AS Crime_Type,
    COUNT(*) AS Complaint_Count
FROM url('https://data.cityofnewyork.us/api/views/5uac-w243/rows.csv?accessType=DOWNLOAD', CSVWithNames)
    AS NYPD_Complaint

WHERE (NYPD_Complaint.`CMPLNT_FR_TM` >= '18:00:00' OR NYPD_Complaint.`CMPLNT_FR_TM` <= '06:00:00')
GROUP BY Crime_Type ORDER BY Complaint_Count DESC limit 10

Getting started with Clickouse cloud

chDB - An Overview

Overview

chDB is an in-process SQL OLAP (Online Analytical Processing) engine powered by ClickHouse. It's designed to be an embedded database that brings the power of ClickHouse to applications without the need for a separate server installation. chDB is developed by ClickHouse, Inc. and open-source contributors.

Key features of chDB include:

  1. Serverless operation - no need to install or run ClickHouse services
  2. Support for multiple programming languages including Python, Go, Rust, NodeJS, and Bun
  3. Ability to work with various data formats like Parquet, CSV, JSON, Arrow, and ORC
  4. Custom User Defined Functions (UDFs)

FR
chDB est un moteur OLAP (Online Analytical Processing) SQL en cours d'exĂ©cution, alimentĂ© par ClickHouse. Il est conçu pour ĂȘtre une base de donnĂ©es embarquĂ©e qui apporte la puissance de ClickHouse aux applications sans nĂ©cessiter d'installation de serveur sĂ©parĂ©e. chDB est dĂ©veloppĂ© par ClickHouse, Inc. et des contributeurs open source.

Les principales caractéristiques de chDB incluent :

  1. Fonctionnement sans serveur - pas besoin d'installer ou d'exécuter des services ClickHouse
  2. Prise en charge de plusieurs langages de programmation, y compris Python, Go, Rust, NodeJS et Bun
  3. Capacité à travailler avec divers formats de données tels que Parquet, CSV, JSON, Arrow et ORC
  4. Fonctions définies par l'utilisateur (UDFs) personnalisées

Use Case - Why use chDB

Based on the information provided, here are some practical use cases for chDB in real-world applications:

  1. Data analysis in Jupyter notebooks: chDB allows users to run ClickHouse queries directly in Jupyter notebooks, making it convenient for data scientists and analysts to work with large datasets without needing a separate ClickHouse server.
  2. Embedded analytics in applications: Developers can embed chDB into their applications to provide fast analytical capabilities without requiring a separate database server.
  3. Processing large datasets locally: chDB enables processing of large datasets (e.g. Parquet, CSV, JSON files) on a local machine without needing to set up a full database system.
  4. Rapid prototyping: Data engineers and analysts can quickly prototype and test queries on large datasets without deploying a full ClickHouse cluster.
  5. Serverless environments: chDB can be used in serverless functions to provide analytical capabilities without managing database infrastructure.
  6. Data pipeline processing: chDB can be integrated into data pipelines for fast data transformations and aggregations.

FR
Basé sur les informations fournies, voici quelques cas d'utilisation pratiques de chDB dans des applications réelles :

  1. Analyse de donnĂ©es dans les notebooks Jupyter : chDB permet aux utilisateurs d'exĂ©cuter des requĂȘtes ClickHouse directement dans des notebooks Jupyter, facilitant ainsi le travail des data scientists et analystes sur de grands ensembles de donnĂ©es sans avoir besoin d'un serveur ClickHouse sĂ©parĂ©.
  2. Analytique embarquée dans les applications : Les développeurs peuvent intégrer chDB dans leurs applications pour fournir des capacités analytiques rapides sans nécessiter un serveur de base de données séparé.
  3. Traitement de grands ensembles de données localement** : chDB permet de traiter de grands ensembles de données (par exemple, Parquet, CSV, fichiers JSON) sur une machine locale sans avoir à configurer un systÚme de base de données complet.
  4. Prototypage rapide : Les ingĂ©nieurs de donnĂ©es et les analystes peuvent rapidement prototyper et tester des requĂȘtes sur de grands ensembles de donnĂ©es sans dĂ©ployer un cluster ClickHouse complet.
  5. Environnements sans serveur : chDB peut ĂȘtre utilisĂ© dans des fonctions serverless pour fournir des capacitĂ©s analytiques sans avoir Ă  gĂ©rer une infrastructure de base de donnĂ©es.
  6. Traitement des pipelines de donnĂ©es : chDB peut ĂȘtre intĂ©grĂ© dans des pipelines de donnĂ©es pour des transformations et agrĂ©gations rapides des donnĂ©es.

Demo - google collab