FOR RESEARCHERS

Technical reference for academics, data scientists, and journalists who want to audit, use, or cite Hermes analyses.

Corpus composition

The Hermes corpus is the union of two sources, merged in the same JSON schema and the same in-memory index:

Active reports observations submitted directly through Hermes since the platform launched. Fully structured: geometry, equipment, weather/satellite/aircraft/celestial cross-references, investigator notes. Tagged source: HERMES, is_archive: false.
Archived reports a public compilation of NUFORC reports (via the planetsig/ufo-reports open-source scrape), converted to the Hermes schema for use in cohort and cluster analysis. Tagged source: NUFORC, is_archive: true, with archive_provenance field identifying the scrape.

Cohort queries can include or exclude archive data via the exclude_archive flag. By default they include archive so that cluster and volume analyses have statistical power; for studies specific to Hermes-native submissions, set exclude_archive: true.

Data schema

Every case record, active or archive, is a JSON object with the following top-level fields. Fields marked optional may be null on older archive records.

{
  "case_id":    "HERMES-YYYYMMDD-NNNN" or "NUFORC-YYYYMMDD-XXXXXX",
  "source":     "HERMES" | "NUFORC",
  "is_archive": false | true,
  "submitted":  ISO-8601 timestamp,
  "location": {
    "lat":  float,     # WGS84 decimal degrees
    "lon":  float,
    "name": string     # reverse-geocoded label
  },
  "date":       "YYYY-MM-DD",
  "time":       "HH:MM",           # local to timezone field
  "timezone":   string,             # IANA zone or offset
  "facing":     0-359 or null,      # compass bearing, degrees true
  "elevation_angle": 0-90 or null,  # degrees above horizon
  "elevation_ft":    float or null, # observer altitude
  "duration":        string,        # free text, e.g. "30 minutes"
  "duration_seconds":float or null, # normalized (archive only, when available)
  "shape":    enum,     # see /docs/glossary
  "color":    enum or null,
  "light_char": enum or null,
  "intensity":  enum or null,
  "behavior":   enum or null,
  "camera":     string or null,
  "ir":         "Yes"|"No",
  "naked_eye":  "Yes"|"No",
  "live_stream": url or null,
  "description": string,            # witness narrative
  "witnesses":   string or null,

  # Automated cross-reference (Hermes-native submissions only):
  "weather":    { "conditions":str, "temp_f":float, "wind_mph":float,
                  "wind_dir":float, "humidity":float, "cloud_cover":float,
                  "visibility_mi":float, "source":str },
  "satellites": { "count":int, "notable":[str, ...] },
  "aircraft":   { "count":int, "aircraft":[...], "note":str },
  "celestial":  { "moon":{"phase_name":str,"phase_pct":float},
                  "planets":[{"name":str,"altitude_deg":float,"magnitude":float}] },
  "geometry":   [{"distance_km":float, ...}],   # derived line-of-sight

  # Verdict:
  "status":        "OPEN: ..." or "RESOLVED: ...",
  "confidence":    "LOW"|"MEDIUM"|"MEDIUM-HIGH"|"HIGH",
  "eliminations":  [str, ...],
  "flags":         [str, ...],
  "hermes_notes":  str
}

API endpoints

Endpoint	Method	Purpose
`/api/cases`	GET	List recent case IDs (active only)
`/api/case/<id>`	GET	Retrieve a single case record (JSON)
`/api/index/stats`	GET	Corpus size and source breakdown
`/api/cohort/v2`	POST	Filter the corpus; returns match count, aggregates, sample, and reproducibility hash
`/api/cluster`	POST	DBSCAN spatial cluster detection on a cohort
`/api/forecast/volume/v2`	GET	Regional report-volume baseline and z-score
`/api/forecast/conditions`	GET	Current misidentification advisories at a location
`/api/export-filing/<id>`	GET	Pre-formatted MUFON, NUFORC, Enigma filing text

Reproducibility hash

Every cohort and cluster query returns a reproducibility_hash in the form COHORT-XXXXXXXXXXXXXXXX (16 hex chars). The hash is computed as:

SHA256(json.dumps(query_params, sort_keys=True, separators=(',',':')))[:16].upper()

Properties:

Deterministic. Same query parameters always produce the same hash.
Sensitive to parameter values but not to request order. Swapping the order of fields in the JSON body does not change the hash; changing any value does.
Cite-able. A paper can reference a hash as proof that a given analysis was performed; a reviewer POSTs the same query body and verifies the returned hash matches.
Not a cryptographic commitment. Because the query is small and enumerable, the hash is reversible by anyone who can see the query. Don't treat it as a secret.

Statistical assumptions per module

Volume forecast (`/api/forecast/volume/v2`)

Baseline: monthly report counts over the most recent 36 months of data available in the region.
Anomaly criterion: z-score of the latest month against the 36-month mean and standard deviation.
Threshold: |z| > 2.0 raises the anomaly flag (approximately 95% CI).
Key assumption: monthly counts are approximately normally distributed within region. This is often false for sparsely-reported regions; treat anomaly flags from regions with reports_in_region < 100 with caution.
Known confounds: Starlink launches, meteor shower peaks, news coverage, seasonal observer behavior, reporting platform changes.

Spatial cluster detection (`/api/cluster`)

Algorithm: DBSCAN with haversine metric.
Parameters: eps_mi (default 25 mi), min_samples (default 5), max_points (default 8000 for latency).
Output: clusters sorted by size, with centroid, count, date range, top shape, sample places.
Key assumption: cluster density reflects observation density, which includes both phenomena and observer population. Always interpret against population density maps.
Known confounds: urban areas dominate cluster output because cities have more observers. Sparse-population clusters with high counts are more noteworthy than dense-population clusters with the same counts.
When the query exceeds max_points: Hermes subsamples deterministically by case_id (alphabetical order) to keep the analysis stable across runs. The hash reflects the parameters, not the subsample.

Misidentification conditions forecast (`/api/forecast/conditions`)

Aggregates real-time Starlink pass predictions, weather, celestial positions, and aircraft activity at a point.
Output is qualitative (advisory-level), not probabilistic.
Use case: "should I expect elevated misidentification reports here tonight?" not "is this UAP likely?"

Known limitations

Archive data has no geometry fields. Bearing, elevation angle, observer altitude, and most equipment fields are null on NUFORC archive records. Cluster and cohort queries that rely on those fields implicitly exclude the archive portion of the corpus. Filter queries that require geometry should set exclude_archive: true.
Cohort archive is a snapshot. The NUFORC scrape is not continuously updated. Cases filed to NUFORC after the last scrape are not in the archive.
ADS-B coverage is uneven. Low-altitude and remote-area aircraft checks may produce false "low aircraft density" eliminations.
Weather lookup uses the nearest airport or station. Highly localized conditions (fog banks, valley inversions) may not be captured.
Confidence grades are relative, not absolute. They measure elimination vs flag ratios within a single report, not the objective unusualness of the observation.
Reporting bias is not corrected. The Hermes corpus inherits all reporting biases of its sources: US-dominant, night-heavy, English-language-dominant, technology-access-gated.
Photo and video are not yet machine-analyzed. Media is stored but motion extraction, parallax, and apparent-size geometry are not automated.
No post-stratification. Cluster counts are raw; they are not normalized to population density or observer access. This is on the roadmap.

Citation format

For academic or journalistic work, we recommend the following citation pattern:

Hermes UAP Analysis Platform. (v0.16.0). [analysis type].
Reproducibility hash: COHORT-XXXXXXXXXXXXXXXX.
Retrieved from https://projecthermes.tech/research

In running text: "A Hermes cohort analysis (hash COHORT-AA8A..., methodology v0.16.0) of triangle-shape reports since 2010 identified 2,222 matching cases..."

Replication checklist for reviewers

Copy the reproducibility hash from the paper being reviewed.
POST the identical query body to /api/cohort/v2 or /api/cluster.
Verify the returned reproducibility_hash matches.
Check methodology_version if the version has changed since publication, consult the changelog for what changed and whether it affects the analysis.
If numbers differ, the corpus has grown (new reports have been filed). The hash verifies the query; the exact counts depend on the corpus at query time. Reviewers working from an archived corpus snapshot should note the snapshot date.

Contact and contribution

Hermes is open to methodology contributions. If you see a flaw in the elimination logic, a missing cross-reference source, or a better statistical approach for any of the analysis modules, the fastest path is a concrete proposal with references and, where applicable, replication code.