# Scenario 04 — Spoofed AIS Identity

> **Disclaimer:** Synthetic demo data inspired by real Baltic geography, MMSI / OUI
> conventions, and infrastructure. Not real observations. All vessel names, MMSIs,
> MAC addresses, sensor IDs and coordinates are synthetic and have been harmonized
> against the canonical catalogs under `catalogs/`.

## Story

The 12 × 4 m Finnish fishing vessel **F/V TÄHTI** (MMSI `230199540`, IMO `0`,
callsign `OF7654`, AIS type `30` "fishing") is on her routine ~5 h coastal loop
out of Hanko on 2025-05-14, broadcasting Class-B AIS at 5–7 kn and exhibiting
her stable, three-month-old MAC fingerprint at the Hanko-area sensors
(`MAC-HKO-PORT-01..03`, `MAC-HKO-COAST-01`).

At **05:55 UTC**, an unrelated and unnamed platform — referred to here as
**spoof platform "X"** — begins transmitting AIS with **the same MMSI
`230199540`** roughly **78 NM east** of the real TÄHTI, in the central Gulf of
Finland near `59.780 °N 24.100 °E`. The spoof AIS broadcasts a slow `~6 kn`
fishing-vessel transit toward Helsinki and keeps claiming TÄHTI's dimensions
(`12 m × 4 m`, AIS type `30`).

The fact that **two AIS positions exist simultaneously for one MMSI, separated
by ~78 NM, is by itself the single highest-confidence spoof signal**
(`duplicate_mmsi_score = 1.0`) — but a naive duplicate-MMSI detector could be
fooled by replayed messages, a roaming receiver, or a stuck cache. So the
scenario layers two additional independent corroborations:

1. **AIS-claimed vs radar-truth divergence.** The maritime patrol radar
   **RAD-PLN-01** (Dornier 228-class, 4 s sweep, 80 km swath along its
   `59.70 / 23.80 → 60.05 / 25.20` track) holds the spoof hull continuously as
   track `T-7741`. The radar truth starts at the same origin as the AIS-claimed
   track but the hull is moving at **16.6 – 17.1 kn on a straight 064 – 071°
   heading**, with length-from-track-spread estimates of **28 – 34 m**. The two
   tracks fan out visibly: at `T+01:30` the radar position is ~5 NM ahead of
   the AIS-claimed position along the same bearing (`ais_radar_delta_score`
   peaks ~0.50 then climbs above 0.85 as the discrepancy compounds), and the
   length / speed ratios are 2.6× and 2.8× the claimed values
   (`dimension_speed_plausibility_score = 1.0`).

2. **Port-MAC fingerprint divergence.** When platform "X" passes within range
   of the dense Helsinki port-area MAC sensor **MAC-HEL-PORT-04** (Katajanokka)
   between `07:18 UTC` and `08:00 UTC`, it leaves a footprint of **14 unknown
   MACs** drawn from real Huawei (`00:E0:FC`) and ZTE (`34:DE:1A`) OUIs plus a
   handful of locally-administered prefixes. The Jaccard overlap with the
   TÄHTI baseline MAC set is **0.00**, and the manufacturer histogram cosine
   similarity is **0.06** (TÄHTI baseline = Apple-dominated, ~46 % Apple, 29 %
   Samsung, 20 % u-blox; spoof window = Huawei + ZTE + Unknown, ~94 %).

A coincidentally co-transiting small fishing vessel **F/V SILAKKA**
(MMSI `230888022`, 14 × 4 m, AIS type `30`) does **not** trigger an alert.
Her AIS is internally consistent (no duplicate), the coastal radar pickup
matches the claimed dimensions and speed, and her own MAC fingerprint at
`MAC-HEL-PORT-05` (South Harbour) matches her own historical baseline (a
distinct three-MAC Apple/u-blox set anchored at Porkkala marina). She proves
the detector is selective.

The fusion of `(duplicate MMSI) ∧ (AIS↔radar positional/physical delta) ∧
(MAC fingerprint anomaly)` yields a high-confidence spoof verdict
(`spoof_score ≈ 0.94`) on MMSI 230199540 that **no single sensor could
justify alone**, while SILAKKA stays at `spoof_score ≈ 0.05`.

### Why "AIS-claimed vs radar-truth" is implemented as two separate tracks

This scenario uses the existing `AisTrack` and `RadarTrack` generator
primitives directly. The `AisTrack` for the spoof copy has its own slow
(`~6 kn`) waypoint set; the `RadarTrack` for the spoof platform has a
separate fast (`~17 kn`) waypoint set with the **same origin** but a longer
reach over the same time window. Both originate at `59.780 °N 24.100 °E` at
`05:55 / 06:10 UTC` respectively. The generator does **not** use
`AisTrack.spoof_position_offset` — the divergence is built into the two
independent waypoint lists. This makes the "AIS-claimed vs radar-truth"
divergence an emergent property of the data rather than a post-hoc offset,
exactly the way it would appear to a downstream fusion engine.

## Geographic Layout

| Element | Approximate position | Role |
|---|---|---|
| Hanko fishing harbour | `59.823 °N 22.972 °E` | Real TÄHTI home port |
| Spoof origin | `59.780 °N 24.100 °E` | Central GoF, ~78 NM E of Hanko |
| Helsinki Katajanokka | `60.148 °N 24.960 °E` | `MAC-HEL-PORT-04` — primary spoof-window sensor |
| Helsinki South Harbour | `60.150 °N 24.955 °E` | `MAC-HEL-PORT-05` — SILAKKA berth |
| Porkkala lighthouse | `59.980 °N 24.380 °E` | `MAC-PRK-COAST-01` — catches SILAKKA departure |
| RAD-PLN-01 patrol track | `59.70 / 23.80` → `60.05 / 25.20` | Sole continuous truth-source for spoof |
| RAD-COAST-HEL-01 | `60.152 °N 24.952 °E`, r = 22 NM | Picks up spoof after RAD-PLN-01 departs |

### Catalog assets used

- **Vessels** (`catalogs/personas.json`): `230199540 F/V TAHTI` (spoof victim,
  the *claimed* identity for both AIS copies), `230888022 F/V SILAKKA` (decoy).
  `SPOOF PLATFORM X` is also declared in the catalog with `mmsi = null` and
  `broadcasts_mmsi = 230199540`.
- **Persons** (`catalogs/personas.json`): `P-TAH-OWNER` (Apple iPhone
  `A4:83:E7:5C:9B:51`). Two additional guest MACs are invented per the spec
  (one Apple, one Samsung).
- **MAC sensors** (`catalogs/sensors.geojson`): `MAC-HKO-COAST-01`,
  `MAC-HKO-PORT-01`, `MAC-HKO-PORT-02`, `MAC-HKO-PORT-03`, `MAC-HEL-PORT-04`,
  `MAC-HEL-PORT-05`, `MAC-PRK-COAST-01`.
- **Radar sensors** (`catalogs/sensors.geojson`): `RAD-PLN-01` (airborne),
  `RAD-COAST-HEL-01` (coastal).

## Timeline (UTC, 22 events)

| t_rel | wall clock | actor | event | signals |
|---|---|---|---|---|
| T−02:15 | 2025-05-14T03:40Z | F/V TÄHTI (real) | Departs Hanko, AIS Class B, SOG 5.6 kn | ais |
| T−02:10 | 2025-05-14T03:45Z | MAC-HKO-PORT-02 | First scan of TÄHTI shows expected fingerprint | mac |
| T−01:45 | 2025-05-14T04:10Z | MAC-HKO-COAST-01 | Hanko outer-mole hand-off; RSSI matches 10-visit mean | mac, ais |
| T−00:25 | 2025-05-14T05:30Z | F/V SILAKKA (decoy) | Departs Porkkala marina toward Helsinki, AIS clean | ais |
| **T+00:00** | **2025-05-14T05:55Z** | **Spoof platform X** | **Begins broadcasting MMSI 230199540 at 59.780N 24.100E, claimed 6 kn** | **ais** |
| T+00:05 | 2025-05-14T06:00Z | AIS aggregator | Two AIS messages with same MMSI, ~78 NM apart → `duplicate_mmsi_score = 1.0` | composite |
| T+00:15 | 2025-05-14T06:10Z | RAD-PLN-01 | Patrol enters operating area, begins 4 s sweeps | plane_radar |
| T+00:25 | 2025-05-14T06:20Z | RAD-PLN-01 | Track T-7741 acquired; length est 31 m, SOG 16.8 kn, COG 071° | plane_radar |
| T+00:27 | 2025-05-14T06:22Z | Fusion engine | Radar 31 m / 16.8 kn ⇏ AIS 12 m / 6 kn → `dimension_speed_plausibility_score = 0.92` | composite |
| T+00:40 | 2025-05-14T06:35Z | MAC-HKO-PORT-02 | Real TÄHTI re-acquired at Hanko — disambiguates spoof from relocation | mac, ais |
| T+01:10 | 2025-05-14T07:05Z | RAD-PLN-01 | T-7741 sustained 16.9 kn toward Helsinki; AIS still claims fishing-vessel | plane_radar, ais |
| T+01:17 | 2025-05-14T07:12Z | F/V SILAKKA (decoy) | Enters Helsinki South Harbour roads at 6.0 kn; MAC at MAC-HEL-PORT-05 | ais, mac |
| T+01:23 | 2025-05-14T07:18Z | MAC-HEL-PORT-04 | 14 unknown MACs during 9-min dwell, Huawei/ZTE-dominated | mac |
| T+01:27 | 2025-05-14T07:22Z | Fusion engine | Jaccard(spoof, TÄHTI baseline) = 0.00 → `mac_fingerprint_anomaly_score = 0.95` | composite |
| **T+01:30** | **2025-05-14T07:25Z** | **Fusion engine** | **`spoof_score(230199540) ≈ 0.94 ≥ 0.70` → alert** | **composite** |
| T+01:31 | 2025-05-14T07:26Z | Fusion engine | `spoof_score(230888022) ≈ 0.05` → SILAKKA correctly NOT flagged | composite |
| T+01:35 | 2025-05-14T07:30Z | Operator console | "AIS identity spoof — MMSI 230199540 (claimed F/V TÄHTI)" | composite |
| T+01:50 | 2025-05-14T07:45Z | RAD-PLN-01 | Aircraft turns south, last T-7741 fix logged | plane_radar |
| T+01:55 | 2025-05-14T07:50Z | RAD-COAST-HEL-01 | Coastal radar picks up spoof hull entering 6 NM zone | coastal_radar |
| T+02:10 | 2025-05-14T08:05Z | Spoof platform X | AIS broadcasts cease abruptly | ais |
| T+02:12 | 2025-05-14T08:07Z | Fusion engine | Hull radar-only → "ghost / went dark" | composite |
| T+02:35 | 2025-05-14T08:30Z | Analyst | Case closed as confirmed spoof; TÄHTI baseline unchanged | composite |

A machine-readable copy is in `timeline.json`.

## Signal Sources

| Stream | Sensor / origin | File | Notes |
|---|---|---|---|
| AIS dynamic | TÄHTI real + spoof X (both as MMSI 230199540) + SILAKKA + ~80 ambient vessels | `data/realtime/ais.ndjson` | Cadence 10 s for the three primary tracks, 60 s for ambient. Disclaimer record on line 1. |
| AIS snapshot (last seen / MMSI) | derived | `data/realtime/ais_snapshot.geojson` | Note: only one feature is rendered per MMSI; downstream consumers must rely on `ais.ndjson` to see both spoof + real positions. |
| Plane radar | RAD-PLN-01 (track `T-7741`) | `data/realtime/plane_radar.ndjson` | 4 s sweep over spoof hull. |
| Coastal radar | RAD-COAST-HEL-01 | `data/realtime/coastal_radar.ndjson` | 2 s sweep, picks up spoof after the patrol turns south. |
| MAC sensors | 7 catalog sensors (Hanko, Helsinki, Porkkala) | `data/realtime/mac.ndjson` and `.csv` | CSV header is **verbatim** the canonical 12-column header. |

## MAC Fingerprint Narrative

### TÄHTI baseline (Hanko, ≥10 prior visits)

The persistent crew/vessel MACs for F/V TÄHTI are designed so that any
single visit reveals 3–4 of them and the union across the 11 historical
visits is the *baseline set*:

| Role | macAddress | OUI vendor | Notes |
|---|---|---|---|
| `P-TAH-OWNER` (owner phone) | `a4:83:e7:5c:9b:51` | Apple | iPhone, present in 11 / 11 visits |
| guest-1 phone (invented) | `a4:83:e7:5c:9b:52` | Apple | iPhone, present in ~10 / 11 visits |
| guest-2 phone (invented) | `38:f9:d3:11:22:53` | Samsung | Galaxy, present in ~9 / 11 visits |
| onboard router (invented) | `a4:3c:5a:7a:00:01` | u-blox | LTE router, present in 11 / 11 visits |

The owner phone + onboard router are present **every visit**; the two guest
phones are present in most visits with occasional drop-outs (a guest sat out
a trip), giving the realistic "≥ ⌈N/2⌉" persistence shape.

The manufacturer histogram of the baseline is **Apple-dominated** with a
secondary Samsung peak and a small u-blox tail. Cosine similarity to any
single visit is ≥ 0.92 across all 11 visits.

### Spoof window fingerprint (Helsinki Katajanokka, 07:18 – 08:00 Z)

The spoof platform leaves behind a **14-MAC set** dominated by Huawei
(`00:E0:FC`) and ZTE (`34:DE:1A`) OUIs plus a handful of locally-administered
prefixes. **None of these MACs ever appears in the 11-visit TÄHTI
baseline.** The five Huawei + four ZTE entries make the manufacturer
histogram look like a Chinese consumer-device cluster — exactly the opposite
of the Apple-heavy baseline.

Jaccard(`spoof_window`, `tahti_baseline`) = `0.00`. Cosine
(`mfr_hist_spoof_window`, `mfr_hist_tahti_baseline`) = `0.06`.

### SILAKKA decoy fingerprint (Helsinki South Harbour, 07:10 – 07:45 Z)

SILAKKA's own three-MAC fingerprint:

| Role | macAddress | OUI vendor |
|---|---|---|
| skipper's iPhone | `a4:83:e7:6b:11:22` | Apple |
| crew Apple device | `b0:7d:64:6b:33:44` | Apple-BLE |
| onboard router | `a4:3c:5a:6b:55:66` | u-blox |

Jaccard vs TÄHTI baseline = 0.00 (would look suspicious in isolation),
**but** the fingerprint match is evaluated against the **claimed identity's
own baseline** (SILAKKA's own baseline at her Porkkala home port matches
1.0). So `mac_fingerprint_anomaly_score(SILAKKA) ≈ 0.05`. This is the
discriminating logic the demo wants downstream consumers to learn.

## Decoy explanation: why F/V SILAKKA does *not* alert

SILAKKA has every individual surface property that, if you squinted, could
look spoof-adjacent (small fishing vessel, transiting toward Helsinki,
unknown port-MAC mix vs TÄHTI's baseline). She is the deliberate false
positive that a naïve detector would flag and a correct fusion would not:

| Signal | SILAKKA value | Why it's near zero |
|---|---|---|
| `duplicate_mmsi_score` | 0.00 | Her MMSI `230888022` is unique in the AIS stream. |
| `ais_radar_delta_score` | 0.00 | Coastal radar picks her up only briefly at expected position; delta ≤ 1 NM. |
| `dimension_speed_plausibility_score` | 0.00 | Declared 14 m / 6 kn matches the radar dimension estimate (~13–15 m) and observed SOG (~6 kn). |
| `mac_fingerprint_anomaly_score` | 0.05 | Anomaly is computed against **her own** baseline, which her observed fingerprint matches. |
| **Composite** | **~0.013** | Below the 0.40 review threshold. |

## Confidence Model

Four canonical signals (see `catalogs/ontology.md` and
`generators/scoring.py`), weights summing to **1.0**, alert threshold **0.70**:

| Signal | Weight | What it measures |
|---|---:|---|
| `duplicate_mmsi_score`             | **0.30** | 1.0 if same MMSI seen at ≥ 2 positions > 5 NM apart within 30 s |
| `ais_radar_delta_score`            | **0.25** | `min(max_delta_nm / 10, 1.0)` between AIS pos and best-matching radar track |
| `dimension_speed_plausibility_score` | **0.20** | Magnitude of radar ÷ AIS-claimed ratio for length and SOG |
| `mac_fingerprint_anomaly_score`    | **0.25** | `1 − fingerprint_similarity(F_obs, F_base_claimed_identity)` |

`spoof_score = 0.30·s1 + 0.25·s2 + 0.20·s3 + 0.25·s4`

| Subject | s1 | s2 | s3 | s4 | `spoof_score` | Verdict |
|---|---:|---:|---:|---:|---:|---|
| Spoof platform "X" claiming MMSI 230199540 | 1.00 | 0.90 | 1.00 | 0.95 | **0.9375** | Alert ✅ |
| Real F/V TÄHTI (Hanko) | 1.00 | 0.00 | 0.00 | 0.05 | 0.313 | Review (analyst clears via baseline match) |
| F/V SILAKKA (MMSI 230888022) | 0.00 | 0.00 | 0.00 | 0.05 | 0.0125 | No action ✅ |

## KQL Sketches

Synthetic table names: `ais_dynamic`, `radar_tracks`, `mac_sessions`,
`vessel_fingerprint_baseline`. All times UTC.

### 1. `duplicate_mmsi_score`

```kql
ais_dynamic
| where ts between (datetime(2025-05-14T03:00:00Z) .. datetime(2025-05-14T09:00:00Z))
| summarize positions = make_set(pack('lat', lat, 'lon', lon, 'ts', ts), 50)
            by mmsi, bin(ts, 30s)
| where array_length(positions) >= 2
| extend max_sep_nm = toreal(
    geo_distance_2points(
        toreal(positions[0].lon), toreal(positions[0].lat),
        toreal(positions[-1].lon), toreal(positions[-1].lat)) / 1852.0)
| extend duplicate_mmsi_score = iff(max_sep_nm > 5.0, 1.0, 0.0)
| project ts_window = bin(ts, 30s), mmsi, max_sep_nm, duplicate_mmsi_score
```

### 2. `ais_radar_delta_score`

```kql
let window = 10s;
ais_dynamic
| where mmsi == 230199540
| extend t_bin = bin(ts, window)
| join kind=inner (
    radar_tracks
    | where source_sensorId in ('RAD-PLN-01', 'RAD-COAST-HEL-01')
    | extend t_bin = bin(ts, window)
  ) on t_bin
| extend delta_nm = geo_distance_2points(lon, lat, lon1, lat1) / 1852.0
| summarize max_delta_nm = max(delta_nm) by bin(ts, 5m), mmsi
| extend ais_radar_delta_score = min_of(max_delta_nm / 10.0, 1.0)
```

### 3. `dimension_speed_plausibility_score`

```kql
ais_static
| where mmsi == 230199540
| extend claimed_length = dim_to_bow + dim_to_stern
| join kind=inner (ais_dynamic
    | where mmsi == 230199540
    | summarize claimed_sog = avg(sog_kn) by mmsi) on mmsi
| join kind=inner (
    radar_tracks
    | where source_sensorId == 'RAD-PLN-01' and trackId == 'T-7741'
    | summarize radar_len = avg(length_estimate_m), radar_sog = avg(speed_kn)
  ) on $left.mmsi == $right.$dummy
| extend len_ratio = radar_len / todouble(claimed_length),
         sog_ratio = radar_sog / todouble(claimed_sog)
| extend dimension_speed_plausibility_score =
        min_of(max_of(len_ratio, sog_ratio) - 1.0, 1.0)
```

### 4. `mac_fingerprint_anomaly_score`

```kql
let claimed_mmsi = 230199540;
let baseline =
    vessel_fingerprint_baseline
    | where mmsi == claimed_mmsi
    | project mac_set_base = mac_set, mfr_hist_base = mfr_hist;
let observed =
    mac_sessions
    | where processingTimestamp between (datetime(2025-05-14T07:00:00Z) .. datetime(2025-05-14T08:30:00Z))
        and deviceId in ('MAC-HEL-PORT-04', 'MAC-HEL-PORT-05')
    | summarize mac_set_obs  = make_set(macAddress),
                mfr_hist_obs = make_bag_if(deviceManufacturer, isnotempty(deviceManufacturer));
observed
| extend dummy = 1 | join kind=inner (baseline | extend dummy = 1) on dummy
| extend jaccard = toreal(array_length(set_intersect(mac_set_obs, mac_set_base)))
                 / toreal(array_length(set_union(mac_set_obs, mac_set_base)))
| extend cosine_mfr = series_cosine_similarity(
        pack_array_values(mfr_hist_obs), pack_array_values(mfr_hist_base))
| extend similarity = 0.5*jaccard + 0.3*cosine_mfr
| extend mac_fingerprint_anomaly_score = 1.0 - min_of(similarity / 0.45, 1.0)
```

### 5. Joined `spoof_score` (weights sum to 1.0)

```kql
let claimed_mmsi = 230199540;
let s1 = toscalar( /* duplicate_mmsi_score max for claimed_mmsi */ );
let s2 = toscalar( /* ais_radar_delta_score max */ );
let s3 = toscalar( /* dimension_speed_plausibility_score */ );
let s4 = toscalar( /* mac_fingerprint_anomaly_score */ );
print
    duplicate_mmsi_score               = s1,
    ais_radar_delta_score              = s2,
    dimension_speed_plausibility_score = s3,
    mac_fingerprint_anomaly_score      = s4,
    spoof_score = 0.30*s1 + 0.25*s2 + 0.20*s3 + 0.25*s4
// For MMSI 230199540 on 2025-05-14: ~ 0.9375 -> Alert.
// For MMSI 230888022 (SILAKKA):      ~ 0.0125 -> No action.
```

## Ingestion notes

All output files are written by `generate.py` under `data/`:

```
data/
  realtime/
    ais.ndjson                 # all AIS dynamic msgs incl. spoof + real TÄHTI + SILAKKA + ambient
    ais_snapshot.geojson       # last-seen-per-MMSI (note: snapshot deduplicates the spoof case)
    plane_radar.ndjson         # RAD-PLN-01 track T-7741 (spoof hull truth)
    coastal_radar.ndjson       # RAD-COAST-HEL-01 hand-off (spoof hull, no AIS)
    mac.ndjson                 # NDJSON form of MAC sessions
    mac.csv                    # canonical 12-column header verbatim
  static/
    area_of_interest.geojson
    sensors_used.geojson
    infrastructure_used.geojson
    tahti_baseline_route.geojson
    silakka_decoy_route.geojson
    spoof_claimed_track.geojson
    spoof_observed_track.geojson
  historical/
    ais_baseline.ndjson        # 11 prior TÄHTI Hanko visits (Class B AIS)
    mac_baseline.ndjson        # 11 prior visits MAC observations (NDJSON)
    mac_baseline.csv           # same in canonical CSV (verbatim header)
```

* Disclaimer record on **line 1** of every NDJSON file, **first non-header
  row** of every CSV (commented with `#`).
* MAC CSV header is the canonical 12-column header from
  `generators/mac_generator.py:MAC_CSV_HEADER`.
* The two AIS streams sharing MMSI `230199540` are interleaved by timestamp in
  `ais.ndjson`. Downstream consumers see two simultaneous positions for the
  same MMSI exactly as they would in a real receiver feed.

Run with:

```bash
python scenarios/04-spoofed-ais-identity/generate.py
```

A `data/_generation_summary.json` is written at the end with per-file
record / feature counts and total bytes.

## Disclaimer

This scenario is **non-operational synthetic demo data**. No real vessel,
person, device, MMSI, IMO, MAC address, callsign or location is intended.
All identifiers are fabricated for the `r-mac-data-scenarios` project. Real
OUI prefixes are used for realism only and do **not** imply any real-world
vendor involvement.
