# Ingestion Strategy

The ingestion pipeline is designed to preserve historical judgment instead of hiding it in code.

As of the SQLite curation migration, the preferred path for larger data work is:

```text
source document -> evidence claim -> interpretation -> disposition export
```

The local DB import/export bridge must validate and compare cleanly before any enriched output replaces public JSON.

## Phase 1: Geography

1. Query DARE by ID or Pleiades ID.
2. Cache raw GeoJSON into `data/import_cache/`.
3. Normalize selected records into `data/places.json`.
4. Preserve DARE IDs, Pleiades IDs, geometry precision, and fetch date.
5. For schematic auxiliary placement, import DARE fort, fortress, and station records as anchor candidates. These anchors are geography only, not unit evidence.

Helper:

```powershell
node scripts/import-dare-place.mjs 20687 20684
powershell -ExecutionPolicy Bypass -File scripts/import-dare-anchor-places.ps1
powershell -ExecutionPolicy Bypass -File scripts/repair-place-encoding.ps1
```

The DARE bbox endpoint is useful for discovery but capped. Do not build empire-wide place coverage from a single bbox call.

## Phase 2: Unit Registry

1. Add canonical units to `data/units.json`.
2. Add aliases, abbreviations, numbering variants, and honorary titles to `data/unit_aliases.json`.
3. Do not resolve units by ad hoc string replacement in dispositions.

## Phase 3: Evidence Items

1. Add a source record in `data/sources.json`.
2. Classify the source with `source_family`, `source_quality`, `source_scope`, and `source_caution`.
3. Add an evidence record in `data/evidence_items.json`.
4. Preserve document references such as RIB, CIL, AE, RMD, papyrus IDs, or Notitia chapter/line.
5. Summarize evidence briefly. Do not copy long copyrighted source text.

Current internal source-family defaults:

- `primary_epigraphy`: inscriptions and military diplomas, e.g. RIB/CIL/AE/RMD records.
- `primary_documentary`: papyri or administrative documents, e.g. papyri.info records.
- `primary_literary`: ancient narrative or administrative text witnesses, where not modeled as Notitia.
- `archaeology`: fort/site/fieldwork records.
- `notitia`: Notitia Dignitatum witnesses, indices, and local Notitia workbook data; always late-antique.
- `scholarly_synthesis`: specialist books/articles and controlled scholarly summaries such as Livius or Spaul/Holder-style works.
- `tertiary_web`: discovery/bootstrap pages such as Wikipedia, Britannica, or broad encyclopedia articles.
- `gazetteer`: DARE, Pleiades, Wikidata, Trismegistos, Vici, ToposText-style place identity sources.
- `geographic_dataset`: basemaps, province/display geometry, vector tiles.
- `curation_work`: local draft/manual records and display scaffolding.

Do not confuse source family with evidence form. A source can be a corpus, while the extracted evidence item is a diploma; a gazetteer can normalize a place without proving a deployment.

The public atlas does not expose this full taxonomy as its main filter. It derives a reader-facing `source basis` instead:

- `Primary Sources`: inscriptions, diplomas, papyri, ancient literary/administrative texts, and archaeology/site records.
- `Secondary Sources`: scholarly synthesis and provisional tertiary/bootstrap sources, with tertiary cautions still visible in details.
- `Notitia Dignitatum`: Notitia-derived evidence, kept separate because it is late-antique and methodologically special.
- `Other / Curation`: local curation, display scaffolding, gazetteers, or geographic datasets if they ever directly back a displayed record.

## Phase 4: Dispositions

1. Add a disposition only after the evidence item exists.
2. Choose `location_mode` before choosing geometry.
3. Use `exact_site` only when the evidence or cited synthesis supports the site.
4. Use `province` for diplomas or rosters that list units only in a province during initial ingestion.
5. Use `uncertain` for probable, contested, or generated schematic marker assignments.
6. Use `late_roman_only` for Notitia records.

For the Notitia cleanup passes, prefer this escalation order:

1. try a direct DARE/local place match;
2. if that fails, try a curated place override with a named seeded place;
3. only fall back to a raw `geometry_override` when a real site is defensible but no stable place record exists yet;
4. leave the row on a schematic province marker only when neither a place link nor an honest uncertain site anchor can be defended.

Recent examples of stage-2 and stage-3 reconciliation:

- `Bononia` in Pannonia -> `dare:10864` (Banoštor), while the Dacian homonym stays `dare:10896` (Vidin).
- `Alta Ripa` -> `dare:10909` (Tolna).
- `Ad Herculem` -> `dare:10904` (Pilismarót–Kis-hegy).
- `Terenouthis` -> `dare:28532` (Kom Abu Billo).
- `Menoida` -> `manual:menois-western-negev` (Horvat Maon near Nirim), still `uncertain`.
- `Moahile` -> `manual:moahila-shaar-ramon`, still `uncertain`.
- `Narmouthis` -> `manual:narmuthis-medinet-madi`, still `uncertain`.
- `Diospolis parva` -> `manual:diospolis-parva-hu`, still `uncertain`.
- `Chenoboscium` -> `manual:chenoboscium-qasr-el-saiyad`, still `uncertain`.
- `Pityus`, `Rhizaion`, and `Sebastopolis` are now seeded as named uncertain places instead of anonymous map points.

## Phase 5: Validation

Run:

```powershell
powershell -ExecutionPolicy Bypass -File scripts/build-unit-histories.ps1
powershell -ExecutionPolicy Bypass -File scripts/build-legion-metadata.ps1
powershell -ExecutionPolicy Bypass -File scripts/validate-data.ps1
powershell -ExecutionPolicy Bypass -File tests/run-tests.ps1
```

Current validation blocks:

- Missing source or evidence links.
- Enum drift against `data/schema.json` for unit classes, source types, evidence forms (`evidence_kind`), location modes, presence types, confidence labels, and review queue states.
- Places without a schema-declared stable identifier field such as DARE, Pleiades, Trismegistos/TM, manual, Wikidata, Topostext, or Vici.
- Diplomas rendered as exact-site markers.
- Diploma or roster estimates that carry a place ID instead of schematic geometry.
- Province records with exact place IDs.
- Notitia records without late-layer flags.
- Legion units without profile metadata.
- Strong UTF-8 mojibake markers in place names.
- Inverted date ranges.
- Broken unit, place, province, evidence, or source IDs.

Node equivalents are also available as `scripts/validate-data.mjs` and `tests/domain.test.mjs` for environments where Node is executable. Python equivalents are available as `scripts/validate_data.py` and `tests/test_atlas_data.py`.

## Manual Templates

CSV templates live in `data/import_templates/`:

- `sources.csv`
- `evidence_items.csv`
- `places_manual_crosswalk.csv`
- `unit_aliases.csv`
- `dispositions.csv`

These are intended for human curation and later conversion into normalized JSON.

## Expansion Priorities

1. Replace broad synthesis layers with primary attestation bundles where possible.
2. Review the generated Hadrianic auxiliary roster against Holder, Spaul, diplomas, inscriptions, papyri, and regional corpora.
3. Add more provincial military diplomas as province-level auxiliary evidence, then optionally generate low-confidence schematic markers.
4. Expand Notitia records command by command, accepting exact markers only when station normalization is safe.
5. Add contested auxiliary reconstructions after source review.

## Current Bootstrap Expansion

`scripts/expand-curated-data.ps1` regenerates the current broad auxiliary and late-antique bootstrap:

- Adds source and evidence records for the public Hadrianic auxiliary roster and selected Notitia command lists.
- Generates province-level auxiliary dispositions for the Hadrianic roster.
- Adds selected Notitia records as late-only exact markers only when DARE place normalization is safe.
- Leaves ambiguous Notitia stations as province-level records.
- Adds schematic display hulls for province-level visualization.

`scripts/build-unit-histories.ps1` regenerates `data/unit_histories.json`:

- Builds one expanded history entry for every normalized unit.
- Uses disposition/evidence data for service history, exact-vs-estimated counts, and links.
- Adds conservative class profiles for role and armament.
- Adds curated snippets for selected well-known units without moving or overriding dispositions.
- Leaves unknown named events explicit rather than inventing them.

`scripts/expand-legion-redeployments.ps1` regenerates the curated legion movement bootstrap:

- Adds missing first-century and disbanded legions such as I Germanica, IV Macedonica, V Alaudae, XV Primigenia, XVI Gallica, XVII, XVIII, XIX, XXI Rapax, and XXII Deiotariana.
- Adds finer movement phases for already-modeled legions where the cited synthesis gives named bases or clearly province-level movement.
- Runs the Wikipedia/Livius legion audit that added I Macriana Liberatrix, IV Italica, VI Hispana, XX Siciliana, aliases for II Sabina/II Gallica/X Equestris, and the Augustan/Julio-Claudian backfill needed to surface the expected Principate legion count.
- Writes `data/legion_wikipedia_audit.json`, including the unresolved late-Roman backlog that should be handled through Notitia/specialist ingestion rather than guessed stations.
- Adds audited movement/profile support for III Augusta in Africa and II Traiana Fortis in the Trajanic and early Hadrianic window, keeping inferred markers low confidence.
- Keeps conjectural placements as `uncertain` or `province`; examples include the Varian legions, the Nijmegen phase of IX Hispana, and the Bar Kokhba disappearance theory for XXII Deiotariana.
- Adds DARE-normalized places only after checking that the match is regionally plausible.

`scripts/ingest-late-legions.mjs` adds the cautious late-legion audit layer:

- Adds missing late Roman legion units from the public late-legion checklist only after comparing against Livius where available.
- Adds exact-site late markers only where the source names a site and the local DARE place exists.
- Uses `uncertain` manual-coordinate markers for plausible stations that still need DARE reconciliation, such as Noviodunum ad Istrum, Adiuvense/Ybbs, Trapezus, and Cusas/Qusiyah.
- Keeps very sparse names, such as I Flavia Theodosiana and VI Gemella, as searchable unit shells with no displayed marker.
- Adds review-queue items for disputed or sparse cases, including I Martia, VI Gallicana, the Scythian I Iovia / II Herculia station problem, and II Felix Valentis Thebaeorum.
- Updates profiles, aliases, unit histories, evidence, sources, and dispositions together so the admin/editor and public atlas stay in sync.

`scripts/ingest-notitia-corpus.py` adds the full Maier-based Notitia corpus without flattening atlas logic:

- Reads `data/notitia_db/notitia_dignitatum_units_corpus.xlsx` directly from the local workbook.
- Applies cautious manual Notitia place seeds and occurrence-specific overrides from `data/notitia_db/place_overrides.json`.
- Treats that override file as authoritative for repeated `place_id` seeds, so later coordinate/order corrections replace earlier generated place records instead of being ignored.
- Preserves all 995 grouped B04 unit identities as searchable atlas units, reusing an existing late/notitia unit only when there is a safe one-to-one match.
- Imports all 1192 occurrence rows as `notitia` evidence items.
- Creates map dispositions only for station-like occurrence rows, using:
  - `exact_site` when the place normalizes directly to the local DARE corpus,
  - `uncertain` when the place match is heuristic or the workbook itself marks the relation uncertain,
  - `province` when the place string is explicit but the current pass can only normalize safely to a coarse province.
- Uses the override layer for cases where the workbook preserves a meaningful station string but the local corpus still needs a curated crosswalk, for example Acimincum/Stari Slankamen, Diospolis Parva/Hu, Chenoboscium/Qasr el-Saiyad, Thannuris/Tell Tnenir, Anatha/Hanaser, Morbium/Piercebridge, Tacasarta/Kassassin, Ziza/Al-Jizah, Danaba/Al-Hafar, Saltatha/Sadad, Thelseae/Al-Dumayr, Gerra/Tel Mahmudiyeh, Bononia/Vidin, Sucidava/Celei or Izvoarele by command context, and Transdrobeta/Pontes.
- Arabia-specific examples now include:
  - direct identifications such as `Speluncae -> Deir el-Kahf`, `Dia-Fenis -> Qasr el-Azraq`, `Adittha/Adtitha -> Khirbet es-Samra`, and `Avatha -> al-Bakhra'`;
  - frontier-sector anchors such as `Naarsafari -> Wadi Afaris / Qasr Bshir sector`, `Castra Arnonensia -> Wadi Afaris frontier sector`, and `Arnona -> Via Nova / Wadi Mujib crossing sector`;
  - explicitly weak sector anchors such as `Libona` and `Asabaia`, which are kept `uncertain` with large precision buffers instead of pretending to be exact forts.
- Phoenice-specific examples now include:
  - named uncertain anchors such as `Calamona -> Maaloula`, `Euhari/Euhara -> Hawarin/Euroea`, `Mons Iovis -> Khan el-Qattar`, `Arefa -> ar-Rafi'ah / Bir Qassab`, and `Thama -> Khan Abu Shamat`;
  - corridor-sector anchors such as `Lataui/Latavi -> Mahin frontier sector` and weaker lesser-register names like `Neia`, `Rene/Neve`, `Verofabula`, and `Veranoca`, which are moved off the province grid but kept explicitly low-confidence;
  - replacement of wrong-theater homonyms when the corpus had matched a formally similar name outside Phoenice, as with the previous Anatolian `Adatha` match now replaced by a broad northeast-Qalamoun sector anchor.
- Palaestina-specific examples now include:
  - literature-backed uncertain anchors such as `Sabure sive Veterocaria -> Wadi Sabra near Petra`, `Idiota -> Oboda/Avdat`, `Tarba -> Mezad Tamar / Qasr el-Juheiniyeh`, `Iehibo -> Ein Yahav`, and `iuxta Iordanem fluvium -> Lower Jordan / Qasr al-Yahud sector`;
  - deliberately broad land-based sector anchors for still-unresolved lesser-register names such as `Sabaia`, `Hasta`, `Cartha`, `Afro`, and the Palaestina `Calamona`, which are shown as `uncertain` rather than left on a province grid or falsely tied to an exact fort;
  - explicit refusal to collapse the Palaestina `Calamona` into the unrelated Phoenice `Calamona/Maaloula` identification.
- Comes Aegypti / Aegyptus cleanup now moves the remaining province-grid stations to named but uncertain anchors:
  - `Andro -> Andropolis western Delta sector`, `Nee -> Nea Polis lower-Egypt sector`, `Thaubasteos -> Thaubasion Wadi Tumilat sector`, `Tohu -> Thoy Wadi Tumilat sector`, `Castra Iudaeorum -> Tell Yehud/Gheyta sector`, `Muson -> Mousai/al-Sarirya sector`, and `Cefro -> Kephro Middle Egypt sector`;
  - override-level `evidence_place_signal` is used here to keep unit-origin words such as `Vandali`, `Pannonia`, `Galatia`, and `Thracia` out of the displayed station string.
- Dux Thebaidos cleanup now moves the remaining province-grid stations to named uncertain anchors:
  - `Praesentia -> Nag el-Hagar`, `Pampane -> Pampanis/Tentyrite sector`, `Precteos -> Prektis/northern Hermopolite sector`, `Silili -> Selino/Panopolis sector`, `Peamu -> opposite Abydos sector`, and `Castra Lapidariorum -> Basanites/Baram quarry sector east of Syene`;
  - `Nitnu` and `Burgo Severi` remain deliberately broad command-list sector markers because Trismegistos confirms the toponyms but current public evidence does not fix their coordinates.
- Dux Armeniae / Cappadocia cleanup now moves the remaining province-grid stations to named uncertain anchors:
  - direct or gazetteer-backed anchors include `Chiaca -> Kiakis`, `Caene Parembole -> Kaine Parembole`, `Silvanis -> Solonenica/Sedissa/Pirahmet sector`, `Suissa -> Erzincan sector`, and `Sisila -> Sisilisson/Ziziola/Ogutlu-Iskilor sector`;
  - `Auaxa/Avaxa`, `Aeliana`, `Castellum Tablariense`, `Valentia`, and `Mochora` remain broad, explicitly uncertain sector markers because the public evidence is disputed, textually difficult, or does not fix an exact fort.
- Dux Scythiae / Dux Moesiae secundae cleanup now moves the remaining Moesia Inferior province-grid rows to named uncertain anchors:
  - stronger anchors include `Mediolana -> Pirgovo`, `Teglicium/Tegulicium -> Vetren`, `Ansamum/Ansamus -> Asamus/Cherkovitsa`, and `Altinum -> Oltina`;
  - debated or textually difficult anchors include `Cimbrianae -> Gura Canliei/Canlia sector`, `Talamonio/Thalamonium -> Nufaru-Murighiol lower-Danube sector`, `Gratiana -> Salsovia-Halmyris corridor`, and `in plateypegiis -> Danube Delta naval sector`.
- Dux Daciae ripensis cleanup now moves the remaining Dacia province-grid rows to named uncertain anchors:
  - stronger anchors include `Cuneus equitum Dalmatarum Divitensium -> Drobeta`, `Translucum -> Hajducka Vodenica`, `Crispitia -> Koshava`, and `Burgo Zono -> Kozloduy sector`;
  - debated anchors include `Transalba -> Mali Golubinje / Danube Gorge sector`, `Burgus Novus -> Radujevac / Vidin-Dunavci sector`, `Zernis -> Dierna-Orsova / Tekija sector`, `Siosta -> Insula Banului (?)`, and `Sostica -> Kladovo sector`.
- Dux Moesiae primae / Moesia Superior cleanup now moves the remaining Moesia Superior province-grid rows to named uncertain anchors:
  - stronger anchors include `Ad Novas -> Novae/Cezava`, `Tricornio -> Tricornium/Ritopek`, `Zmirnae -> Boljetin`, and the `Auxilium Aureomontanum` correction where the station is `Tricornio` even though the unit name preserves Aureus Mons;
  - debated anchors include `Flaviana -> Kuvin/Kovin sector`, `Contra Reginam -> north-Danube bridgehead sector`, and `Gratiana -> Dobra`, all kept as `uncertain` because the published localizations remain contested or broad.
- Dux Pannoniae secundae / Dux Valeriae / Dux Pannoniae primae cleanup now moves most remaining Pannonian province-grid rows to named uncertain anchors:
  - stronger DARE-backed anchors include `Vindomarae/Vidomana -> Vindobona`, `Constantia -> Szentendre`, `Ad Statuas -> Várdomb`, `Odiabo -> Odiavum/Azaum`, `Ad Novas -> Zmajevac`, `Siscia -> Sisak`, `Mursa -> Osijek`, `Iovia -> Heténypuszta`, `Asturis -> Zwentendorf`, and `Cannabiaca -> Zeiselmauer`;
  - broader or debated anchors include `Leonata -> Pannonia Savia sector`, `Graium -> Rača-Brčko/lower Sava sector`, `Vincentia -> Környe`, `Quadriburgium -> Ságvár` for the Valeria cohort row, and `Burgus Centenarium -> Sopianae/Pécs sector`;
  - `Milites Acincenses` is mapped to `Antunnacum/Andernach` under the Dux Mogontiacensis; the Aquincum/Acincenses wording is treated as a unit-origin/name signal, not a Pannonian station.
  - `Italiciani/Secundarum`, the damaged `Cohors III Alpinorum Dardanorum` row, and `Cohors Caratensis` remain province-level because the Notitia text does not give a defensible station point.
- Dux Belgicae secundae / Dux tractus Armoricani et Nervicani / western-Gaul praepositura cleanup now removes the remaining `province:gallia-belgica` grid rows:
  - DARE-backed anchors include `Constantia -> Coutances`, `Benetis/Venetia -> Vannes`, `Tabernis -> Rheinzabern`, `Ebruduni Sapaudiae -> Yverdon-les-Bains`, and `Viennae sive Arelati -> Vienne`;
  - debated sector anchors include `Marcis -> Marck near Calais`, `Portu Epatiaci -> Boekhoute sector`, `Classis Sambrica -> Etaples / Canche-Somme coast sector`, `Blabia -> Blaye`, `Grannono -> Le Havre sector`, `Grannona in litore Saxonico -> Guernsey sector`, and `Calaronae -> Chalaronne/Saone corridor`;
  - western-Gaul rows outside the current province polygon set can now explicitly carry no modeled `province_id` when keeping `province:gallia-belgica` would be historically misleading.
- Comes Tingitaniae / Comes Africae / Dux Mauretaniae / Dux Tripolitanae cleanup now removes the remaining `province:africa-mauretania-numidia` grid rows:
  - stronger anchors include `Aulucos -> Lixus`, `Tabernas -> Tabernae/Lalla Djilaliya`, `Duga -> Souiyar/Ad Novas sector`, `Pacatiana -> el Benian`, `Audiensis -> Auzia`, `Bazensis/Badensis -> Badias`, `Tubuniensis -> Thubunae/Tobna`, `Zabensis -> Zabi`, `Tubusubditanus -> Tubusuctu/Tiklat`, `Columnatensis -> Columnata/Sidi Hosni`, and `Thamallensis -> Turris Tamalleni/Telmine`;
  - debated or broad limes anchors include `Castrabarensis/Bariensis -> Loukkos military sector`, `Montensis in castris Leptitanis -> Neptitanis/Nefta sector`, `Caputcellensis -> Caput Cillani / Caesarea-Auzia route sector`, `Tablatensis -> Tablat sector`, `Bidensis/Vidensis -> Bida/Djemaa Saharidj sector`, and the unlocalized `Balaretanus`, `Taugensis`, `Augustensis`, `Fortensis`, `Inferior`, and `Muticitanus` rows;
  - Tripolitanian `Tillibarensis` / `Tillibanensibus` rows are anchored at `Tillibari/Remada` with `province_id = null` until a real Tripolitania polygon exists.
- Dux Syriae cleanup now keeps the late Syrian frontier out of the synthetic `syria-palestina-arabia` bucket:
  - direct or cautious anchors include `Occariba -> Occaraba/Uqayribat` and `Marmantharum -> Khirbet Matran between Androna and Seriane`;
  - unresolved names such as `Matthana`, `Acauatha`, `Ammuda`, `Iuthungi`, and `Claudiana` remain province-level where necessary, but are now assigned to the cleaner `province:syria-euphratensis` context instead of the broader mixed bucket.
- Leaves command-only, field-army, and origin-marker occurrences off the map while keeping them searchable in the unit corpus.
- Writes `data/notitia_db/notitia_atlas_reconciliation.json` so unresolved locations, reused-unit matches, and import status stay auditable.
- Adds review-queue summary items instead of exploding the review queue into hundreds of near-duplicate place tickets.

Practical rule for the current Notitia cleanup:

- First try a direct local DARE/place normalization.
- If that fails but the occurrence preserves a meaningful station string, add an occurrence-specific override with `location_mode = uncertain`.
- Prefer context-sensitive homonym resolution when the name exists in multiple provinces, for example Sucidava in Dacia vs. Moesia or Bononia in Dacia vs. Pannonia/Gallia.
- If the best evidence is only a route, oasis, river, or broad facing-site relation, prefer an uncertain anchor over a fake exact fort.
- For Dux Osrhoenae / Dux Mesopotamiae, direct DARE/Pleiades anchors are acceptable for Constantia, Kepha, Oraba/Horaba, Thebeta, and Sapha, but `Thilla-` names, Caini, Ripaltha, Bethallaha, Banasa, Sina Iudaeorum, Rasin, Mediana, Maratha, and Duodecimum stay broad sector markers unless a stronger gazetteer or specialist argument is added.
- For Dux Syriae / Euphratensis, Neocaesarea is anchored cautiously to Athis/Neocaesarea-Dibsi Faraj, while `Maithana/Matthana`, `Acavatha/Acauatha`, `Ammuda`, `Salutaria`, and `Claudiana` remain broad uncertain sector markers. The importer supports an override-level `evidence_place_signal` for cases like `Ala I Iuthungorum`, where the workbook place signal catches the unit ethnonym but the Notitia station is `Salutaria`.
- For Comes Aegypti, route names and weak TM IDs may become broad uncertain anchors when Notitia/Trismegistos/Itinerary evidence clearly preserves a station, but names such as `Cefro/Kephro` and `Nee/Nea Polis` must remain large-buffer sector markers until a stronger specialist localization is added.
- For Dux Thebaidos, prefer exact DARE anchors where already available, but keep `Praesentia?`, `Pampanis`, `Prektis`, `Silili/Selino`, `Peamou`, `Nitnu`, `Burgus Severi`, and `Castra Lapidariorum` as uncertain because the surviving public evidence ranges from debated identification to broad quarry/route/nome context.
- For Dux Armeniae, do not treat the late command title as a clean province. Accept direct DARE route stations where available, but keep Pontic/Lazic/Cappadocian frontier names such as `Silvanis`, `Auaxa`, `Aeliana`, `Valentia`, and `Mochora` uncertain unless a later specialist source fixes them more securely.
- For Dux Scythiae / Dux Moesiae secundae, accept DARE or specialist-gazetteer anchors for known Danube-limes forts, but keep `Cimbrianae`, `Thalamonium`, `Gratiana`, and `in plateypegiis` broad/uncertain because the identifications are debated or the term may not be a normal toponym.
- For Dux Daciae ripensis, check Or. XLII first: several workbook strings preserve unit epithets or province/origin hints rather than the station itself. Use broad uncertain anchors for `Burgus Novus`, `Zernis`, `Transalba`, `Siosta`, and `Sostica`; do not upgrade them to exact sites without stronger specialist evidence.
- For Dux Moesiae primae, check Or. XLI first: `Auxilium Aureomontanum` is stationed at `Tricornio`, not at Aureus Mons, and `Ad Novas` must resolve to the Moesia Prima Novae/Cezava fort, not to the Moesia Secunda Novae/Svishtov legionary base. Keep `Flaviana`, `Contra Reginam`, and `Gratiana` uncertain unless stronger site-specific archaeology is attached.
- For the western Pannonian chapters, check Occ. XXXII-XXXIV first. Do not map unit-origin words such as `Dalmatia`, `Alpes`, or `Dardania`; prefer DARE homonyms only after command context resolves them. Keep blank or title-like rows such as `Caratensis` province-level unless a later source proves a station.
- For western-Gaul Notitia rows, check Occ. XXXVII, XXXVIII, XLI, and XLII first. Do not let the old `province:gallia-belgica` fallback become a false label for Armorican, Sapaudian, Rhone, or Mogontiacensis stations; use explicit `province_id = null` for uncertain markers when the atlas has no appropriate western-Gaul province polygon yet.
- For Africa and Mauretania, distinguish named places from limes districts. Use DARE/TM anchors for known places such as Lixus, Badias, Auzia, Tubusuctu, Zabi, Thubunae, Turris Tamalleni, and Tillibari; use very broad sector markers for district titles that scholarship leaves unlocalized, and say so in `basis`.
- Leave the record province-only only when no defensible anchor survives review.

Run it with:

```powershell
python scripts/ingest-notitia-corpus.py
```

`scripts/repair-place-encoding.ps1` repairs UTF-8 mojibake in imported DARE place/cache strings when a local PowerShell import decodes names incorrectly. It is safe to rerun and the validator blocks strong mojibake markers in place names.

`scripts/build-legion-metadata.ps1` regenerates `data/legion_metadata.json`:

- Builds one profile for every normalized legion unit.
- Stores symbol, founding date, founder, disbandment or last-attested notes, profile confidence, caveats, and profile source links.
- Uses Livius chronology/catalogue and a tertiary public table as bootstrap sources.
- Leaves unknown or disputed fields explicit rather than forcing certainty.

`scripts/estimate-province-markers.ps1` converts broad province records into clickable estimated markers:

- Snaps broad records near DARE fort, fortress, station, or manually selected legionary-base anchors where possible.
- Uses rough frontier spines by province only when no DARE anchor candidate is available.
- Keeps `place_id = null`, sets `location_mode = uncertain`, lowers confidence, and marks the record contested.
- Stores `estimated_anchor_place_id` and `estimated_anchor_note` when a DARE anchor was used, while keeping the unit disposition non-exact.
- Stores the point in `geometry_override` so the UI can render it while still saying no exact place is modeled.
- Should be rerun after the curated auxiliary and legion expansion scripts.

Run the normal validator and tests after regeneration.
