EMEW

From Viae Regiae Wiki
Jump to navigation Jump to search
The Gazetteer of Early Modern England and Wales.[1]

The Gazetteer of Early Modern England and Wales

The purpose of this page is to set out our specification and methodology for the Gazetteer of Early Modern England and Wales (EMEW).

Uses for the Gazetteer

Construction

Our gazetteer (EMEW) is founded on the principle of linked data. This is built into its creation because (although we will also acquire data by other means) the primary process by which a place is added to it is through linking a point on the image of a document with a feature in a pre-existing gazetteer. The tool we use for this is Recogito. EMEW will record both features and routes (geospatial point and line data), with variant spellings, feature type,[2] temporal attributes, and temporo-geospatial variants, as described in the current (or future) Linked Places Format (LPF) specification.

The use of controlled vocabularies within Recogito, linked to Wikidata identifiers as exemplified here, will facilitate grouping, identifying, and searching for features within EMEW.

In our implementation of Recogito, our source documents (maps and texts) are all served by IIIF. At present, Recogito can import reference gazetteers in only a now-superseded version of LPF (originally LPIF, here), to which our reference gazetteers need first to be conformed. The as-yet empty EMEW will also be added in due course.

Recogito does not (yet) have any functionality for georeferencing source features that do not yet appear in one of our loaded gazetteers, and neither is it capable of 'wildcard' searches for place-names or un-named places. To resolve these issues we have developed our own standalone extension, desCartes, which facilitates the discovery of obscurely-gazetteered places through geospatial ElasticSearch, and also allows the georeferencing of points from a variety of historical, modern, and topographical basemaps; soon it will also facilitate linkage to source references for linear features (roads and waterways) and geographic areas. This tool will be developed further as the basis of our EMEW Interface.

EMEW Interface (EMEW-I)

  • Purpose: checking and editing datasets (including those generated by Recogito) for ingestion to EMEW.
  • It will operate on an unadvertised, dedicated, password-protected emew.io url.
  • It will present the feature and any associated imaging in structured form, with appropriate editor plugins for each attribute, including a georeferencer.
  • It will check for unmatched, pre-existing features within EMEW.
  • Controlled vocabulary tags will be assigned to an LPF-conforming type attribute.
  • Approved features will be allocated an EMEW-ID on their insertion.
  • All added features will have a 'beta' flag that can be cleared only after checking by at least two volunteers.
  • Authoritative points might by default be snapped to the closest modern OS-vector road junction.
  • Recogito specifics:
  1. Recogito's reference version of EMEW will require frequent updates.
  2. A module will monitor progress for each document with tagging, checking, and ingestion of features to EMEW; it will also provide the means for defining the date scope of each source document.
  3. The Notes field of each Recogito feature will be parsed to extract keyed information, such as transcribed spelling and feature type (these would normally be minimum requirements), and the corresponding EMEW record will be constructed accordingly.

Data Structure

This will take the form of linked tables in a PostgreSQL database, allowing replication of all feature attributes present within the current LPF specification. Our prototype schema (based on LPF) can be seen here, but this will need to be adapted in order to accommodate the extended range of attributes recorded in Recogito, in particular with regard to markup of IIIF images. Any feature or factoid bearing an EMEW-ID may have any number of LPF links to Wikidata, which will be periodically updated programmatically.

Database and API

EMEW will be stored in a PostgreSQL database, with spatial data using WGS84 projection. Logstash will monitor the database and update an ElasticSearch index.

EMEW's API will deliver data found through ElasticSearch in one of several formats, depending upon the request parameters. Any added search criteria will trigger delivery of a relevant subset (or a single feature) rather than a full dump of the EMEW data. Only the LPF option will deliver all of the data fields.

Search

  • EMEW-ID
  • Textual
    • Exact place-name
    • Place-name with wildcards (*?)
    • Place-name by regex
    • Place-name by phonetics (e.g.soundex)
  • Spatial limits
    • Country
    • County
    • Parish
    • England
      • Hundred
      • Wapentake
    • Wales
      • cwmwd [=commote]
      • cantref
    • Viewport
    • Polygon
  • Wikidata (to assist with our own Wikidata project and with external linking initiatives)
    • EMEW-IDs linked to Wikidata, with matched Wikidata IDs
    • EMEW-IDs unmatched to Wikidata

Outputs

  • Linked Places format (LPF, extended JSON) - the default response.
  • A format enabling link-creation in Wikidata (see Wikidata-EMEW).
  • Standard geoJSON.
  • CSV.
  • KML (for display in Google Earth or Google Maps).
  • HTML options including:
    • The feature(s) marked on a Leaflet map.
    • Leaflet-IIIF instances zoomed to the feature(s) in available source images.
    • Tabulated representation of textual data.
    • A Leaflet map populated with any EMEW feature(s) returned from the URL of a SPARQL query.
  • JSON defining control points for IIIF map warping (under development by Bert Spaan).
  • JSON/XML to serve as a geocoder for a place-name search.

EMEW will also be offered as an SQL dump.

Feature Editor

  • Save (permanently) a snaphot of original state (all codependent tables)
  • Perform editing
  • Commit edits

Licence

We propose to publish EMEW under a CC BY-SA 4.0 (ShareAlike) licence. This will allow others to:

  • Share — copy and redistribute the material in any medium or format.
  • Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

  • Attribution — Users must give appropriate credit, provide a link to the license, and indicate if changes were made. Users may do so in any reasonable manner, but not in any way that suggests that we endorse such use.
  • ShareAlike — If a user remixes, transforms, or builds upon the material, they must distribute our contributions under the same license as the original.

Ideas/Questions

  • We need to add route segments to an internal Viæ Regiæ route gazetteer, generated in QGIS by splitting route polylines (probably a combination of Ogilby/Lea and Oksanen waterways) at nodal points. Only when such segments are geotagged in source documents should they be added to EMEW. As Recogito cannot be used to geotag lines from gazetteers (or can it?), the EMEW-I might need to be extended to serve that purpose.
  • We need to consider how the EMEW-I might handle marked-up raw text sources rather than the image-based text sources currently in use.

Notes

  1. Image from Juan Eusebio Nieremberg, Historia Naturae, 1635.
  2. We need to identify a relevant standard list - see, for example, 'inhabited places' within Getty Vocabularies.