Kafka Spark ClickHouse Superset Kubernetes Helm ArgoCD

Real-time Data Dashboard

Kafka-to-ClickHouse streaming analytics handling 100K events/second with sub-300ms query response.

Real-time Data Dashboard
Overview

What is this project?

A fully cloud-native streaming analytics stack built for a FinTech client needing sub-second visibility into transaction events. Apache Kafka ingests 100K+ events per second across 12 topics. Spark Structured Streaming consumers apply windowed aggregations (1-min, 5-min, 1-hour), detect anomalies using a streaming z-score algorithm, and land results into ClickHouse. ClickHouse materialized views serve the Superset dashboards with query response times under 300ms even at 100M+ row tables.

The entire stack runs on Kubernetes with Helm charts. Kafka and ClickHouse are horizontally scaled via custom HPA policies triggered by consumer lag and query throughput respectively. A GitOps workflow (ArgoCD) handles deployments, making environment promotion from staging to production a one-click operation.