PortfolioPlatform • SRE • Cloud-Native

I build systems that stay calm
when traffic gets loud.

I’m Shivam Kumar — a Platform & SRE engineer. I turn chaotic distributed systems into predictable, observable, and cost-aware platforms.

See work Read writing

Kubernetes-native platformsStreaming + event-driven systemsSLOs + incident responseFinOps-minded observability

Shivam Kumar

Platform & SRE @ Flexera

Bengaluru, India

Distributed systems

Built for scale + clarity

50%

Operational toil reduced

AWS + Azure Associate

Certified

Featured Work

GitHub

Cloud Platforms

Case

Flink on Kubernetes Platform

A self-serve streaming platform with guardrails: sane defaults, paved paths, and operational clarity for developers.

Lag-aware autoscaling + self-healing job orchestration
Multi-tenant isolation and upgrade-safe deployments
Opinionated observability: SLOs, golden signals, runbooks

KubernetesFlinkGitOpsSRE

Case study (soon)

Observability

Case

Cost-First Metrics Migration

Re-architected metrics storage and query patterns to reduce spend without sacrificing incident-time fidelity.

Prometheus → VictoriaMetrics migration with rollout playbook
Dropped cardinality hotspots via re-labeling + guidelines
Cut infra costs by ~60% while improving query latency

PrometheusVictoriaMetricsGrafanaFinOps

Developer Experience

Case

GitOps Delivery System

A delivery pipeline that makes shipping boring: previews, policy checks, and safe progressive rollouts.

ArgoCD-based sync strategy + standardized app templates
Guardrails: policy-as-code, secrets hygiene, drift detection
Reduced operational toil by ~50% through automation

ArgoCDTerraformCI/CDSecurity

Toolbox

Platforms

KubernetesAWSTerraformKafka

Languages

GoPythonBashHCL

Systems

Distributed SystemsEvent-DrivenPerformanceeBPF (curious)

Reliability

SLOsIncident ResponseObservabilityRunbooks

Experience

I like roles where the job isn’t “keep it up”, it’s “make it resilient.”

Flexera

Member of Technical Staff — SRE

Apr 2024 — Present

Orchestrating the reliability of SaaS solutions. Focused on turning distributed chaos into actionable business insights through robust platform engineering.

Cloud-nativePlatformObservability

MoEngage

Site Reliability Engineer (I, II, III)

Dec 2020 — Mar 2024

The Scale Up

Built an in-house Flink on Kubernetes platform: self-healing, lag-aware scaling, and clear on-call ergonomics.

The Cost Killers

Migrated Prometheus → VictoriaMetrics, reducing infra spend by ~60%.

Automation Wins

Built platforms + GitOps pipelines, cutting operational toil by half.

Ingram Micro

Software Engineer — SRE

Jan 2019 — Nov 2020

Foundation years: multi-stage CI/CD on AWS & Azure, lift-and-shift migrations, and making “secure by default” non-negotiable.

Writing

I write to turn tribal knowledge into repeatable playbooks. Expect posts about Kubernetes, Terraform, platform patterns, and reliability thinking.

Browse posts Explore tags

Let’s build something that survives reality.

If you’re hiring for Platform/SRE, or you want to talk cloud-native systems, observability, or streaming workloads — reach out.

Email LinkedIn GitHub

I build systems that stay calmwhen traffic gets loud.

Shivam Kumar

Featured Work

Flink on Kubernetes Platform

Cost-First Metrics Migration

GitOps Delivery System

Toolbox

Platforms

Languages

Systems

Reliability

Experience

Flexera

MoEngage

The Scale Up

The Cost Killers

Automation Wins

Ingram Micro

Writing

Let’s build something that survives reality.

I build systems that stay calm
when traffic gets loud.