# Scenario 03 — Loitering Over Critical Infrastructure

> **Disclaimer:** Synthetic demo data inspired by real Baltic geography, MMSI / OUI
> conventions, and infrastructure. Not real observations. All vessel names, MMSIs,
> MAC addresses, sensor IDs and cable / pipeline alignments are synthetic and have
> been harmonized against the canonical catalogs under `catalogs/`.

## Story

The 189 × 32 m Finnish bulk carrier **MV AALLOTAR** (MMSI `230999401`, callsign
`OJZZ1`, declared AIS type **70 — bulk carrier**) is on a routine Helsinki West
Harbour → Tallinn ballast leg on 2025-03-18 when, mid-transit, she departs the
eastbound Gulf of Finland traffic-separation lane south of Porkkala and
decelerates to ~0.6 kn directly over the catalog **cable-and-pipeline JUNCTION
polygon** (`featureId = cable-pipeline-junction`, centroid ≈ 59.884 °N,
24.570 °E) where the synthetic BalticConnector pipeline alignment overlaps the
Estlink 1 / 2 HVDC cable corridors within ~1.5 NM.

For **4 h 15 min** (11:30 → 15:45 UTC) she executes a 30-minute slow drift
followed by **four overlapping ~600 m loops** centred just off the polygon
centroid — a parametric cloverleaf synthesised in `generate.py` with pure polar
math, no `shapely` dependency.

During that loiter window the primary coastal MAC sensor **MAC-PRK-COAST-01**
(Porkkala lighthouse, ~15 km from the polygon) records a sudden cluster of **38
unique MACs** drawn entirely from real industrial-IoT OUI prefixes:

| OUI prefix | Vendor               | Count | Role             |
|------------|----------------------|------:|------------------|
| `24:0A:C4` | Espressif            | 17    | 1 anchor         |
| `F4:5E:AB` | Texas Instruments    | 13    | 1 anchor         |
| `A4:3C:5A` | u-blox               |  8    | 1 anchor         |

The RSSI band tightens to roughly −98 .. −78 dBm (σ ≈ 2.3 dB), consistent with
co-located surface and just-sub-surface devices ~15 km from the sensor. Faint
secondary hits in the −105 .. −87 dBm band appear at MAC-INK-COAST-01 (~32 km
west), MAC-HEL-COAST-01 (~36 km NE) and Helsinki port-cluster sensors that lie
just inside the over-water radio horizon.

After AALLOTAR resumes ~11 kn south toward Tallinn at 15:45 UTC, **35 of the 38
cluster MACs disappear** within minutes — but **three "anchor" MACs (one per
OUI)** continue pinging from the very same coordinates at a weaker
~−93 dBm for the remainder of the 6-day replay window (~56 hours of post-loiter
persistence, well above the spec's 36-hour requirement). This is the deposited
hardware signature.

A coastal radar **RAD-COAST-PRK-01** at Porkkala (range 20 NM) corroborates the
slow surface track end-to-end through the loiter; a **Dornier 228-class plane
radar RAD-PLN-01** flies a background pass that logs AALLOTAR with `track_quality
0.92` while the loiter is forming.

The same afternoon **M/V VENLA RESEARCH** (MMSI `230888011`, callsign `OJWW5`,
declared AIS type **52 — research vessel**) conducts a legitimate seabed survey
in a six-line lawn-mower pattern ~3.7 NM **SE** of the JUNCTION polygon, well
outside every 500 m infrastructure buffer. Her four invented crew MACs come from
real **consumer** OUIs (Apple `A4:83:E7`, Samsung `38:F9:D3`, Xiaomi `04:CF:8C`,
Apple-BLE `B0:7D:64`) and produce a baseline-like manufacturer mix. **The
detector must NOT alert on her.**

A clean **28-day historical baseline** at every coastal MAC sensor contains
**zero** Espressif / Texas Instruments / u-blox observations — the assertion is
explicit in `generate.py:generate_historical()` and the run fails if it ever
leaks.

## Timeline (UTC)

| t_rel | wall clock | actor | event | signals |
|---|---|---|---|---|
| T−03:30 | 2025-03-18T08:00Z | AALLOTAR | Departs Helsinki West Harbour; SOG 11.2 kn, COG 245° | ais |
| T−02:00 | 2025-03-18T09:30Z | AALLOTAR | Enters eastbound GoF TSS, SOG 11.6 kn | ais, plane_radar |
| T−00:45 | 2025-03-18T10:45Z | AALLOTAR | Porkkala approach (within RAD-COAST-PRK-01 range) | ais |
| T−00:28 | 2025-03-18T11:02Z | AALLOTAR | Course deviation S off TSS, COG 198°, SOG 8.1 kn | ais, coastal_radar |
| T−00:12 | 2025-03-18T11:18Z | AALLOTAR | Crosses N edge of JUNCTION polygon, SOG → 2.4 kn | ais, coastal_radar |
| T−00:05 | 2025-03-18T11:25Z | RAD-COAST-PRK-01 | Slow contact track_quality 0.86, SOG 1.1 kn | coastal_radar |
| **T+00:00** | **2025-03-18T11:30Z** | **AALLOTAR** | **Loiter t0 — drift phase over JUNCTION centroid** | **ais** |
| T+00:02 | 2025-03-18T11:32Z | MAC-PRK-COAST-01 | First cluster MAC `24:0a:c4:11:00:01` (Espressif), RSSI ~−86 dBm | mac |
| T+00:15 | 2025-03-18T11:45Z | MAC-PRK-COAST-01 | Unique-MAC count in 15-min window ≈ 14 (z ≈ 3.6 vs baseline) | mac |
| T+00:30 | 2025-03-18T12:00Z | MAC-PRK-COAST-01 | Cluster mix is ~100 % industrial-IoT; JSD vs baseline ≈ 0.91 | mac |
| T+00:45 | 2025-03-18T12:15Z | AALLOTAR | Begins overlapping ~600 m loop pattern (loop 1 / 4), SOG 0.4–1.1 kn | ais, coastal_radar |
| T+01:10 | 2025-03-18T12:40Z | VENLA RESEARCH | Arrives at survey box (DECOY) ~3.7 NM SE of JUNCTION; SOG 3.2 kn | ais |
| T+01:15 | 2025-03-18T12:45Z | MAC-INK-COAST-01 | Faint cluster hits ~−100 dBm (long-range over-water propagation) | mac |
| T+02:00 | 2025-03-18T13:30Z | MAC-PRK-COAST-01 | Peak unique-MAC count ≈ 23 in 15-min window; RSSI band tightens | mac |
| T+02:15 | 2025-03-18T13:45Z | AALLOTAR | Loop 3 / 4; cumulative dwell inside JUNCTION = 147 min | ais, coastal_radar |
| T+03:00 | 2025-03-18T14:30Z | MAC-PRV-COAST-01 | (Optional) rare ducted propagation hit at ~−103 dBm | mac |
| T+04:00 | 2025-03-18T15:30Z | AALLOTAR | Loop 4 completes; SOG climbs to 4.2 kn — exit begins | ais, coastal_radar |
| T+04:15 | 2025-03-18T15:45Z | AALLOTAR | Exits JUNCTION polygon S edge, SOG 8.8 kn, COG 178° | ais |
| T+04:20 | 2025-03-18T15:50Z | MAC-PRK-COAST-01 | 35 / 38 cluster MACs disappear; 3 u-blox / TI / Espressif anchors remain at ~−92 dBm | mac |
| T+04:45 | 2025-03-18T16:15Z | RAD-PLN-01 | Dornier 228 patrol pass — context only, no alert | plane_radar |
| T+06:00 | 2025-03-18T17:30Z | AALLOTAR | Tallinn approach, SOG 11.0 kn | ais |
| T+06:30 | 2025-03-18T18:00Z | VENLA RESEARCH | Concludes survey, departs NE; consumer-OUI mix unchanged → decoy clear | ais |
| T+10:30 | 2025-03-18T22:00Z | MAC-PRK-COAST-01 | Anchor MACs still pinging every 60–90 s, RSSI ~−93 dBm | mac |
| T+20:30 | 2025-03-19T08:00Z | MAC-PRK-COAST-01 | Anchors persistent, hour 16 | mac |
| T+40:30 | 2025-03-20T04:00Z | MAC-PRK-COAST-01 | Anchors persistent, hour 36 — **deposited hardware confirmed** | mac |
| T+43:00 | 2025-03-20T06:30Z | Fusion engine | Emits incident_score = 0.93 ≥ 0.70; opens `INC-S3-2025-03-18-001` | composite |

Full machine-readable timeline (≥ 25 events) lives in `timeline.json`.

## Signals & weights (canonical)

Composite score uses the five canonical signals defined in
`generators/scoring.py` and `catalogs/ontology.md`:

| Signal                              | Weight |
|-------------------------------------|------:|
| `temporal_dwell_score`              | 0.25 |
| `spatial_proximity_infra_score`     | 0.25 |
| `mac_count_zscore`                  | 0.20 |
| `mac_manufacturer_jsd_score`        | 0.20 |
| `ais_type_behavior_mismatch_score`  | 0.10 |
| **Σ**                               | **1.00** |
| **alert_threshold**                 | **0.70** |

Computed peak (15:00Z bin):
`0.25·0.98 + 0.25·1.00 + 0.20·1.00 + 0.20·0.91 + 0.10·1.00 ≈ 0.93` → ALERT.

Computed VENLA peak (decoy):
`0.25·0.62 + 0.25·0.00 + 0.20·0.18 + 0.20·0.18 + 0.10·0.00 ≈ 0.23` → no alert.

See `weights.json` for the canonical JSON.

## KQL sketches

All five use the named confidence signals and the catalog `cable-pipeline-junction`
polygon. Tables assumed: `mac_observations`, `ais_positions`, `infra_polygons`,
`baseline_mac_prk_coast_01`.

### 1) `temporal_dwell_score` — vessel dwell inside JUNCTION

```kusto
let polygon = toscalar(infra_polygons | where id == "cable-pipeline-junction" | project geom);
ais_positions
| where mmsi == 230999401 and ts_utc between (datetime(2025-03-18) .. datetime(2025-03-19))
| extend inside = geo_point_in_polygon(lon, lat, polygon)
| where inside
| summarize dwell_min = (max(ts_utc) - min(ts_utc)) / 1m by mmsi
| extend temporal_dwell_score = 1.0 / (1.0 + exp(-(dwell_min - 30.0) / 30.0))
```

### 2) `spatial_proximity_infra_score` — distance to JUNCTION centerline

```kusto
let threshold_m = 1000.0;
ais_positions
| where mmsi == 230999401
| extend d_m = geo_distance_point_to_polygon(
    lon, lat,
    toscalar(infra_polygons | where id == "cable-pipeline-junction" | project geom))
| extend spatial_proximity_infra_score = max_of(0.0, min_of(1.0, 1.0 - d_m / threshold_m))
| summarize avg_score = avg(spatial_proximity_infra_score) by bin(ts_utc, 15m)
```

### 3) `mac_count_zscore` — anomalous unique-MAC count

```kusto
let mu = 6.0; let sigma = 2.2;
mac_observations
| where deviceId == "MAC-PRK-COAST-01"
| summarize unique_macs = dcount(macAddress) by bin(todatetime(processingTimestamp), 15m)
| extend z = (todouble(unique_macs) - mu) / sigma
| extend mac_count_zscore = 1.0 / (1.0 + exp(-(z - 3.0) / 2.0))
| where mac_count_zscore > 0.5
```

### 4) `mac_manufacturer_jsd_score` — manufacturer mix divergence

```kusto
let baseline = bag_pack(
    "Apple",0.38,"Samsung",0.22,"Xiaomi",0.09,"Huawei",0.07,
    "Intel",0.06,"other",0.18,"IoT",0.00);
mac_observations
| where deviceId == "MAC-PRK-COAST-01"
| extend mfr = case(
    deviceManufacturer in ("Espressif","Texas Instruments","u-blox"), "IoT",
    coalesce(deviceManufacturer, "other"))
| summarize cnt = count() by bin(todatetime(processingTimestamp), 15m), mfr
| summarize mix = make_bag(pack(mfr, todouble(cnt)))
        by bin_ts = bin(todatetime(processingTimestamp), 15m)
| extend mac_manufacturer_jsd_score = series_jensen_shannon_divergence(mix, baseline)
```

### 5) Composite incident score + `ais_type_behavior_mismatch_score`

```kusto
let w_dwell = 0.25; let w_prox = 0.25; let w_zcount = 0.20;
let w_jsd = 0.20; let w_aismis = 0.10;
let aismis = ais_positions
    | where mmsi == 230999401
        and ts_utc between (datetime(2025-03-18T11:00Z) .. datetime(2025-03-18T16:00Z))
    | summarize min_sog = min(sog_kn), declared_type = any(ais_type)
    | extend ais_type_behavior_mismatch_score =
        iff(declared_type == 70 and min_sog < 1.0, 1.0, 0.0);
incident_features  // pre-joined view of the four other signals per 15-min bin
| extend ais_type_behavior_mismatch_score =
    toscalar(aismis | project ais_type_behavior_mismatch_score)
| extend incident_score =
      w_dwell  * temporal_dwell_score
    + w_prox   * spatial_proximity_infra_score
    + w_zcount * mac_count_zscore
    + w_jsd    * mac_manufacturer_jsd_score
    + w_aismis * ais_type_behavior_mismatch_score
| where incident_score >= 0.70
| project bin_ts, incident_score,
          temporal_dwell_score, spatial_proximity_infra_score,
          mac_count_zscore, mac_manufacturer_jsd_score,
          ais_type_behavior_mismatch_score
```

## MAC fusion narrative

Fusion edges (per `catalogs/ontology.md`):

1. `vessel_position(AALLOTAR, ts)` from `ais.ndjson` and `coastal_radar.ndjson`
   are joined by spatial-temporal proximity (`associated_mmsi = 230999401` is
   populated on the coastal-radar fixes for convenience).
2. `radar_detects(RAD-COAST-PRK-01, AALLOTAR, ts)` confirms AAL's slow track
   independently of AIS through the loiter — defeats an "AIS-only" alert.
3. `observed_at(mac, MAC-PRK-COAST-01, ts)` for every cluster MAC during the
   loiter window, plus faint mirror observations at MAC-INK-COAST-01 and
   MAC-HEL-COAST-01.
4. `near_to(AALLOTAR, MAC-PRK-COAST-01, ts)` and
   `near_to(AALLOTAR, cable-pipeline-junction, ts)` — the latter is the
   `spatial_proximity_infra_score` driver.
5. `part_of(INC-S3-2025-03-18-001, observation_set)` — the fusion engine binds
   all five signals into a single incident and persists the three anchor MACs
   as long-tail evidence.

The **anchor MACs are the key**: a transient cluster could be explained as a
crowded passing vessel or a one-off propagation event, but **three previously
unseen industrial-IoT MACs continuing to ping from a fixed coordinate over the
JUNCTION for 36+ hours after the carrier has left** is hard to explain with
anything other than deposited hardware. The historical baseline contains zero
appearances of these OUIs, so the JSD-based manufacturer signal stays at
0.91 throughout the persistence window.

## Decoy — VENLA RESEARCH (false-positive suppression)

| Property | Value |
|---|---|
| MMSI | 230888011 |
| Declared AIS type | 52 (research vessel) |
| Behaviour | 6-line lawn-mower seabed survey, line spacing ~75 m, SOG 3.0–3.5 kn |
| Survey box centroid | 59.825 °N, 24.658 °E (≈ 3.7 NM SE of JUNCTION) |
| Crew MACs (invented, scenario-local) | `A4:83:E7:5C:9B:71` Apple · `38:F9:D3:11:22:71` Samsung · `04:CF:8C:55:66:75` Xiaomi · `B0:7D:64:A1:5B:77` Apple-BLE |
| Distance to nearest 500 m infra buffer | > 4 km → `spatial_proximity_infra_score = 0` |
| Manufacturer JSD vs baseline | ~0.18 (consumer mix, baseline-like) |
| `ais_type_behavior_mismatch_score` | 0 (research vessel doing research) |
| Composite peak | ≈ 0.23 → suppressed |

She tests three different parts of the suppressor at once: the spatial gate
(far from infra), the JSD gate (consumer-only MACs), and the AIS-type gate (52
is allowed to loiter). A naïve "vessel is slow near sensor" rule would alert on
her; the composite must not.

## Ingestion notes

- **NDJSON streams** (`ais.ndjson`, `plane_radar.ndjson`, `coastal_radar.ndjson`,
  `mac.ndjson`): first line is the `__meta__` disclaimer JSON object with
  `dataset = "s3-loitering-critical-infra/<stream>"`. All subsequent lines are
  one record per line.
- **MAC sensor CSV** (`mac.csv`): line 1 is a `#`-prefixed comment containing the
  disclaimer JSON; line 2 is the **verbatim** real-device header (12 fields:
  `sessionStart,messageCount,onlineDurationSeconds,sessionEnd,processingTimestamp,deviceId,version,macAddress,averageSignalStrength,deviceManufacturer,ingestion_ts,status`).
  Optional fields (`sessionStart`, `sessionEnd`, `onlineDurationSeconds`,
  `deviceManufacturer`) may be the literal string `None` to mirror the real
  device behaviour.
- **GeoJSON assets**: top-level `FeatureCollection` carries a sibling `"_meta"`
  key with the disclaimer object; downstream tools that ignore custom roots can
  pick the disclaimer up from `properties.__meta__` on the first feature.
- **Kusto ingestion**: recommend attaching the disclaimer as table-level extent
  tag `synthetic=true` and propagating a `__meta__ = "synthetic"` row property.

### Output layout

```
scenarios/03-loitering-critical-infra/
├── README.md                       <- this file
├── weights.json                    <- canonical Σ=1.0 weights, alert_threshold=0.70
├── timeline.json                   <- ≥25 timeline events (machine-readable)
├── generate.py                     <- one-shot generator
└── data/
    ├── realtime/
    │   ├── ais.ndjson              <- AALLOTAR + VENLA + ~120 ambient ships
    │   ├── ais_snapshot.geojson    <- last-seen per MMSI
    │   ├── plane_radar.ndjson      <- RAD-PLN-01 Dornier pass
    │   ├── coastal_radar.ndjson    <- RAD-COAST-PRK-01 corroboration
    │   ├── mac.ndjson              <- ndjson form of MAC sensor sessions
    │   └── mac.csv                 <- CSV form, real-device header verbatim
    ├── historical/                 <- 28-day baseline, ZERO industrial-IoT OUIs
    │   ├── ais_baseline.ndjson     <- normal AAL transits, every 4 days
    │   ├── mac_baseline.ndjson
    │   └── mac_baseline.csv
    └── static/
        ├── area_of_interest.geojson
        ├── sensors_used.geojson
        ├── infrastructure_used.geojson  <- includes cable-pipeline-junction
        ├── decoy_survey_box.geojson
        └── loiter_track.geojson    <- synthesised loop pattern
```

## How to run

```powershell
# from repo root
python scenarios/03-loitering-critical-infra/generate.py
```

A machine-readable run summary is also written to
`scenarios/03-loitering-critical-infra/data/_generation_summary.json`.

## Deviations from the harmonized spec (v1.0)

The spec hand-drew a separate JUNCTION polygon at ~59.94 °N / 24.02 °E with
MAC-INK-COAST-01 as the primary sensor at ~1.6 km. The repository's canonical
`catalogs/infrastructure.geojson` places the `cable-pipeline-junction` polygon
~50 km east at **24.498..24.642 °E / 59.847..59.921 °N** (centroid
≈ 59.884, 24.570). Per project instructions, the **catalog is authoritative**,
so:

- The loiter centroid is the **catalog** polygon centroid.
- **MAC-PRK-COAST-01** (~15 km) becomes the primary cluster sensor; MAC-INK-COAST-01
  (~32 km) and MAC-HEL-COAST-01 (~36 km) provide faint secondary corroboration.
- MAC-PRV-COAST-01 lies > 60 km from the centroid (beyond the over-water radio
  horizon of low-mounted antennas) so the "tertiary propagation anomaly" hit
  from the spec is not generated. The narrative still references it for context.
- RSSI bands at the primary sensor are ~−98 .. −78 dBm (slightly weaker than the
  spec's −82 .. −74 dBm because the sensor is 15 km away rather than the spec's
  ~1.6 km), still tight enough for the JSD and z-score signals to fire.
- Anchor persistence is implemented as four daily slices (`ANCHOR_SLICE_A..D`,
  totalling ~56 h) instead of one continuous 36-h window, per instructions to
  keep file sizes manageable.

## Disclaimer

All data in this scenario — vessel names, MMSIs, IMOs, callsigns, MAC addresses,
sensor IDs, cable / pipeline alignments, AIS positions, radar fixes, the
JUNCTION polygon and the entire incident — is **synthetic**. The geography,
MMSI / OUI conventions and infrastructure types are inspired by real Baltic
operations to make the demo legible, but nothing in this directory represents a
real observation, a real vessel movement, or a real incident.
