# Data Model

The atlas is organized around normalized entities. The authoritative machine-readable contract is `data/schema.json`; validators and admin enum lists read shared enumerations and validation constants from that file.

The project also has a local SQLite curation layer described in `DATABASE.md` and `database/schema.sql`. The DB is preservation-first: it imports the current JSON/GeoJSON, keeps stable public IDs, and exports the same public file shapes for static deployment. Historical enrichment should happen through source documents, evidence claims, interpretations, and then dispositions, not by directly rewriting map records from raw research hits.

## Places

File: `data/places.json`

Fields:

- `place_id`: local stable ID, usually `dare:<id>`.
- `external_ids`: DARE, Pleiades, Trismegistos, Wikidata, and other crosswalk IDs.
- `ancient_name`, `modern_name`.
- `geometry`, `geometry_type`, `geometry_precision_m`.
- `geometry_source`.
- `province_id`.
- `notes`.

DARE IDs are preferred for place normalization. Pleiades IDs are crosswalks, not replacement IDs.

## Provinces

File: `data/provinces.geojson`

Province features include:

- `province_id`
- `name`
- `date_from`, `date_to`
- `centroid`
- `geometry_quality`
- `source_id`
- `notes`

The seed Britannia polygon is explicitly schematic. It is for displaying province-level evidence and is not valid for territorial analysis.

## Units

File: `data/units.json`

Fields:

- `unit_id`
- `canonical_name`
- `unit_class`: legion, ala, cohort, numerus, vexillation, or other.
- `subtype`
- `origin_notes`
- `date_from`, `date_to`
- `notes`

The unit table does not encode every title variant. Variants go into aliases.

## Unit Aliases

File: `data/unit_aliases.json`

Aliases model numbering variants, abbreviated forms, honorary titles, and source-specific spellings. Search uses this table to resolve forms like `Legio VIIII Hispana` to `Legio IX Hispana`.

## Sources

File: `data/sources.json`

Fields:

- `source_id`
- `source_type`
- `source_family`: internal curation classification, such as primary epigraphy, primary documentary, Notitia, scholarly synthesis, tertiary web, gazetteer, geographic dataset, or curation work.
- `source_quality`: practical authority level: primary, specialist, scholarly, tertiary, gazetteer, public dataset, or curation.
- `source_scope`: what the source is used for: military evidence, place normalization, basemap, display geometry, contextual reference, or source discovery.
- `title`
- `author_editor`
- `year`
- `citation_short`
- `url`
- `identifier`
- `source_caution`: short warning about what the source can and cannot prove.
- `notes`

Sources describe corpora, gazetteers, datasets, text witnesses, specialist works, tertiary discovery pages, or local curation records. Exact document references live in evidence items.

The public atlas derives two reader-facing filters from the linked evidence and source:

- `source basis`: Primary Sources, Secondary Sources, Notitia Dignitatum, or Other / Curation.
- `source domain`: Epigraphy / Diplomas, Papyrology / Documents, Archaeology / Site Records, Ancient Texts, Notitia, Scholarly Synthesis, Tertiary / Discovery, Gazetteers / Place Data, Geographic / Display Data, or Curation / Manual.

That public vocabulary is intentionally broader than the internal `source_family`, `source_quality`, `source_scope`, and `source_caution` fields used by admin, validation, SQLite curation, and future research automation.

Source classification is not the same as `evidence_kind` / evidence form: RIB can contain inscription or diploma evidence, DARE/Pleiades are place-normalization sources, and Notitia is both a special late-antique source family and a specific evidence form. The evidence form remains necessary for guardrails such as diplomas staying province-level and Notitia records staying late-only.

## Source Catalog

SQLite adds a unified `source_catalog` layer that does not replace `data/sources.json`. It bridges evidence-backed source works and future lookup targets:

- `source_catalog`: one canonical resource entry with display name, source domain, default basis, authority tier, status, access/documentation URLs, query/public flags, cautions, notes, and preserved payload JSON.
- `source_catalog_source_links`: links catalog resources to current `source_works` rows.
- `source_catalog_target_links`: links catalog resources to future `source_research_targets` rows.

Target-only catalog entries such as CIL/ACE, EDCS, EDH, WallGIS, or general discovery resources do not affect public map filtering until accepted evidence links them to displayed records.

## Evidence Items

File: `data/evidence_items.json`

Fields:

- `evidence_id`
- `source_id`
- `evidence_kind`: inscription, diploma, papyrus, literary, archaeology, Notitia, synthesis, or manual review.
- `document_ref`
- `document_url`
- `transcription_or_summary`
- `attestation_date_from`, `attestation_date_to`
- `date_note`
- `provenance_notes`

Evidence date is the date of the source signal. It is not automatically the same as a unit occupation range.

## Dispositions

File: `data/dispositions.json`

Fields:

- `disposition_id`
- `unit_id`
- `place_id` nullable
- `province_id` nullable
- `geometry_override` nullable
- `location_mode`
- `presence_type`
- `display_date_from`, `display_date_to`
- `evidence_id`
- `confidence_score`, `confidence_label`
- `reasoning_note`
- `is_contested`
- `late_roman_only`
- `notes`
- `estimated_anchor_place_id` optional, for low-confidence generated markers snapped near DARE geography anchors.
- `estimated_anchor_note` optional, explains that the DARE anchor is schematic and not an attested station.

Location rules enforced by validation:

- `exact_site` requires a valid `place_id`.
- `province` requires `province_id` and must not set `place_id`.
- Diplomas cannot be converted into exact-site dispositions.
- Notitia evidence must set `late_roman_only` and cannot be displayed before the late empire.
- Stable place identity can come from the schema-declared external ID fields, including DARE, Pleiades, Trismegistos/TM, manual IDs, Wikidata, Topostext, and Vici.

## Unit Histories

File: `data/unit_histories.json`

Generated fields:

- `unit_id`
- `overview`
- `type_and_role`
- `armament`
- `service_history`
- `service_and_events`
- `events`
- `evidence_summary`
- `caveats`
- `links`

The history layer is an interpretive display aid built from normalized evidence, conservative unit-class profiles, and limited curated summaries. It must not override dispositions or upgrade uncertain locations.

## Legion Metadata

File: `data/legion_metadata.json`

Generated fields:

- `unit_id`
- `symbol`
- `symbol_note`
- `founded.label`, `founded.date_from`, `founded.date_to`
- `founder`
- `founder_note`
- `disbanded.label`, `disbanded.date_from`, `disbanded.date_to`
- `disbandment_note`
- `confidence_label`
- `caveats`
- `sources`

This profile layer records symbols, founding dates, founders, and disbandment/last-attested notes for legions where the loaded sources support them. Unknowns remain explicit. These fields do not create dispositions and do not replace evidence-linked location records.

## Review Queue

File: `data/review_queue.json`

Open issues are tracked as data, not hidden in comments. This keeps uncertain bounds, schematic geometry, and missing corpora visible during expansion.

Supported fields:

- `review_id`
- `severity` (`high`, `medium`, `low`, `info`)
- `topic`
- `status` (`open`, `in_progress`, `resolved`, `deferred`, `rejected`)
- `unit_id` nullable
- `place_id` nullable
- `disposition_id` nullable
- `evidence_id` nullable
- `source_id` nullable
- `source_hint` nullable
- `note`
- `recommended_action` nullable

The admin validator checks duplicate review IDs, allowed status/severity values, and optional links to known units, places, dispositions, evidence items, and sources.
