BalticFusion · r-mac-data-scenarios
Role 1 / 5 · time horizon: continuous

Data Integrator

"Plug in the next sensor. Land the data. Curate the schema."

Persona

Insinööri Juho Halme

Data Platform Engineer · Helsinki · 11 years in data engineering, last 4 on Fabric

Juho is the one nobody talks to when things work — and the one everybody calls when a coastal MAC sensor disappears off the dashboard at 02:00. He owns the bronze → silver Lakehouse layers, the Data Factory pipelines that pull new sensor feeds, and the OneLake shortcuts that expose partner data into the same workspace.

A new coastal MAC sensor in Inkoo? That's a half-day onboarding: provision an Eventstream, validate CSV headers against the canonical schema, register the sensor in the catalog, route the stream through the dedup/parse processor, and land into the silver Delta table. He measures success in schema-drift incidents per quarter (target: zero).

⚠ synthetic persona

Daily workflow

Key data products

Data productSource scenario(s)Fabric toolRefresh cadence
Sensor onboarding pipeline
Eventstream → bronze Delta → silver Delta with parse/dedup, per sensor family
all (infra)Fabric Data Factory + Eventstreamcontinuous
Schema validation report
CSV header drift checks, type-cast failures, null-rate anomalies per source
all sourcesNotebook + Lakehouse validation tabledaily
OneLake shortcut registry
Index of partner-hosted data exposed read-only into this workspace via shortcuts
partner feedsOneLake shortcuts + Lakehouse catalogon change
Eventstream routing config
Topic → derived stream → destination wiring per sensor
S1–S6 realtimeEventstreamon change
Sensor & infra catalog
Canonical sensors.geojson + infrastructure.geojson as source of truth
allGit + Lakehouse external tableon change
Pipeline SLO dashboard
Run success %, latency p95, schema-drift count per quarter
all pipelinesPower BIhourly

Linked scenarios

S6 — Multi-Stage Combo The schema stress-test What this role sees: every sensor family contributes — if any one of them drifts schema, the whole flagship visualisation breaks. S6 is the canary. S5 — Drone Launch From Ship Onboarding the airborne MAC sensor What this role sees: the only scenario that exercises the drone-borne MAC sensor pipeline — used as the integration test fixture. S3 — Loitering Over Critical Infrastructure Infra-polygon catalog dependency What this role sees: the loitering detector relies on infrastructure.geojson polygons being current — Juho is the one who updates them when EnergiNet publishes a new alignment.

Fabric tools used

Fabric Data Factory Eventstream Lakehouse (bronze / silver / gold) OneLake shortcuts Notebooks (validation) Power BI (SLO dashboard) Git (catalog source-of-truth)

Example Data Agent prompts

Dashboard mockup

Data Integrator · Pipeline health · last 24 h
PIPELINE SUCCESS · 24 h 99.96% target ≥ 99.5% LATENCY p95 (bronze→silver) 12.4 s target ≤ 30 s SCHEMA DRIFTS · this quarter 1 target 0 · partner-side ACTIVE SENSORS 14 / 14 all green · last heartbeat < 30 s PIPELINE LINEAGE · sensor → bronze → silver → gold → consumer AIS · Digitraffic MAC · coastal ×6 PLN-RAD · radar DRN-RAD · drone bronze (raw Delta) silver (parsed Delta) gold · incident_bundle gold · rt_dashboard_feed gold · forensic_evidence gold · commander_kpi Intel notebooks Real-Time Dash PBI evidence Commander PBI