Digital mapping for the humanities and the social sciences - HIST8872
  • Overview
  • Syllabus
  • Instructions
  • Intro to R
  • Case-studies
    • The Paisley Dataset
    • 19th-c. Spanish literacy
    • The State of the Union Speeches
    • The Tudor Network of Letters
  • Other sources
  • Further readings

On this page

  • Shapefiles
  • Raster data:
  • Textual corpuses
  • Aerial imagery / Satellite images
  • Scanned historical maps
  • Historical shapefiles / rasters
  • Historical datasets including spatial coordinates
  • Norwegian data
  • Miscellaneous

Sources of spatial data

Below you can find a curated set of varied datasets that you can use for the class assignments and your own research projects. Please make sure to read the underlying documentation describing the information. The list is obviously highly selected based on my own interests and availability, so you are also free to choose your own sources.

Different R packages also make it easy to import spatial objects. For instance, the packages geodata, spData, rnaturalearth or maps facilitate access to climate, crops, elevation, land use, soil, administrative boundaries and other data ((moraga2023?) surveys some of these packages here). National agencies will also have shapefiles with all settlements and many other spatial features. GeoNorge, for instance, provides thousands of shapefiles and raster files for Norway.

Shapefiles

  • GADM provides boundaries at different administrative levels for all countries.

  • Natural Earth.

Raster data:

  • WorldClim: historical climate data (1970-2000).

  • FAO-GAEZ: suitable agricultural land and crop suitability indexes.

  • SRTM Digital Elevation Database

  • European Soil Database

  • NASA Earth Data: night-light luminosity.

These examples provide global or international datasets. National agencies may have their own datasets.

The packages geodata or rnaturalearth also facilitate importing this kind of information into R.

Textual corpuses

As discussed during the course, there are computational tools that allow extracting locations from textual corpuses (entity recognition) and then assign them geographic coordinates (geo-coding). The package tidygeocoder “makes getting data from geocoding services easy” (cambon2021?). As illustrations, this kind of tools have been employed in the following projects:

  • The Trading Consequences project (clifford2016?): extracting a vast amount of information on the geographical location of commodities exchanged in the British economic world during the long 19th century (1789-1914).

  • In the Mapping the State of the Union, Mitch Fraas and Ben Schmidt extract the locations mentioned in the 224 State of Union addresses delivered yearly since 1790.

  • The emotions of London project text-mines place-names from 18th- and 19th-century novels and the emotions they elicit in the corpus.

Aerial imagery / Satellite images

National agencies started to survey their entire territory using aerial photographs in the 1930s, a practice that continued to the present and therefore constitute an important source of historical information (with millions of aerial photos at multiple points in time). Recent examples using historical aerial photographs are (sylvester2012?), (midgley2017?), (pinol2018?), (carvalho2021?) and (llena2023?) that track the temporal evolution of urban areas, crop fields, glacial dynamics, coastal erosion and forest cover (and land abandonment).

More recently, satellite imagery have become publicly available and have open up completely new ways of doing research. As well as high spatial resolution and global geographic coverage, these remote sensing technologies provide information that it is difficult to obtain by other means (donaldson2016?). The LANDSAT program was launched in 1972 and other programmes joined in in the 1990s and later, so contemporary historians can make use of these technologies to provide visual evidence, as well as comparing images taken at different periods and track changes in land cover and quality, night lights, topography, deforestation, pollution, drought, weather and climatic fluctuations, etc. Within the social sciences, this information has been primarily employed by economists; see (donaldson2016?) and (wuepper2025?) for surveys of recent literature. Likewise, (munteanu2024?) stresses the potential of globally available black-and-white satellite photographs available from the 1960s.

Scanned historical maps

Historical maps contain spatial information about political and cultural borders, transport infrastructure, topographical information, land cover, buildings, etc., so they constitute a fantastic historical source. Here are some online collections:

  • The US Historical Topographic Map Collection

  • The David Rumsey Map Collection

  • OldMapsOline

As examples of projects geo-referencing old maps, see the Viabundus Project (holterman2023?). Based on the atlas Hansische Handelsstraßen, this project has produced shapefiles containing the roads and waterways connecting northern Europe between 1350 and 1650, as well as the institutional nodes behind these transportation networks (towns and settlements, tolls, fairs, staple markets, etc.). Likewise, while (heblich2021?) relies on topographical maps published between 1880 and 1900 to extract the location of 5,000 industrial chimneys and trace atmospheric pollution patterns in British cities, (redding2024?) maps the destruction of London during the Second World War. Similarly, (siodla2015?) and (hornbeck2017?) use historical maps to understand the effects of the great fires in Boston and San Francisco. Likewise, Charles Butcher and his team (here at NTNU) rely on maps to identify the political influence of pre-colonial African states. More examples can be found in these blog posts by Alexandra Cirone and James Feigenbaum.

Digitising old maps involves two steps: (1) geo-referencing a historical map, that is, adding real-world spatial coordinates, and (2) digitising the spatial features you are interested in using dots, lines or polygons (creating a shapefile). Although this process can be done using R, it is more intuitive using specific GIS software such as QGIS or ArcGIS. The Programming Historian and the Geospatial Historian offer great tutorials both in QGIS and ArcGIS (clifford2013?; see also gregory2007?). Manually digitising points, lines or polygons can nonetheless be a time-consuming activity. Alternatively, advances in computational methods enable automatically extracting digital versions from scanned images of historical maps (or aerial images). Although the combination of text and symbols (lines, polygons, etc.) still pose significant challenges to automated pattern recognition methods, this is already a very promising area (hosseini2021?; combes2022?; litvine2024?; mcdonough2024?).

Historical shapefiles / rasters

Historians have been busy creating historical GIS, so there are plenty of shapefiles already available to the public.

  • The China Historical GIS with placenames and administrative units for the Chinese Dynasties.

  • The Great Britain Historical GIS supplying administrative boundaries since the early 19th century.

  • The US National Historical Geographic Information System containing all levels of U.S. census geography, including states, counties, tracts, and blocks, from 1790 through the present).

  • The French Historical GIS, 1700-2020 (including administrative units, transportation networks, etc.; (litvine2023?)). See also the Mapping the Third Republic. A Geographic Information System of France (1870–1940) (gay2020?).

  • Historical regional boundaries and transportation infraestructure in Europe since the mid-19th century (marti2023?).

Likewise, different websites have collected lists of national historical GIS, as well as examples of projects using GIS tools, such as The Historical GIS Research Network or Geospatial Historian.

  • Historical gazetteers. The project A vision of Britain through time has gathered around 2 million historical place names from the early 19th century onwards. The Digital Gazetteer of the Song Dynasty (906-1276 CE) (mostern2022?). Pleiades, a community-built gazetteer of ancient places. Similarly, the project ESPAREL has extracted and geo-referenced the almost 20,000 population entities existing in the 1887 Spanish nomenclator and link them with their current and past counterparts (esparel2022?). The World Historical Gazetteer is a platform that hosts many of these initiatives geo-locating historical place names across the world.

  • Historical Climatology

  • Ships’ logbooks are a especially valuable source since their entries not only recorded the vessels’ geographical position (longitude and latitude), but also systematic meteorological information (and other events, such as whales seen or captured, etc.) daily or even several times a day (smith2012?; garcia2018?; walker2024?). See also the Whaling History, the Weather Time Machine or the Old Weather projects.

  • The Historical Settlement Data Compilation for the United States (HISDAC-US) (uhl2021?; connor2020?). Historical gridded settlement layers derived from property records since 1810. These files count the number of built-up properties devoted to different uses (agricultural, commercial, industrial, residential, etc.) per grid cell and therefore allows tracking the evolution of urbanization and land Use use over time.

These are just some examples. A fine-grain online search, specifying the area, the period and the topic of interest may also produce the desired results. Moreover, plenty of research, either by public institutions or individual researchers, has also used GIS tools but has not made the underlying data public. Most maps published nowadays in books, academic journals, newspapers and websites make use of these tools and therefore are based on shape or raster files that can be shared and reproduced. Authors are usually happy to share their materials providing they are properly referenced, so contacting them is always advisable.

Historical datasets including spatial coordinates

As well as historical locations themselves, there are also plenty of examples of historical information that has also been geo-located.

  • VOC Dataset (Petram et al. 2024): This dataset stores the pay ledgers of the Dutch East India Company’s (VOC), primarily from the eighteenth century. It contains almost 800,000 records containing each crew member’s name, place of origin, rank, wage, etc. The raw information has been carefully curated and stored in several .csv files that can be merged together using the corresponding IDs. Read more about this source here.

  • Tudor Network of Power (Ahnert et al. 2023). This data contains all (surviving) items of correspondence in the Tudor State Papers (1509-1603), which are the official government records of the Tudor period in England. As explained by the authors (Ahnert and Ahnert 2023), data cleaning and curation constituted a significant effort. As well as more traditional quantitative methods, this data set is suited for the network analysis.

  • Theater History of Operations Reports provides 4,8 million observations defined by the position of an aircraft bombing a particular target in the Vietnam War between 1965 and 1975.

  • A brief history of human time (Laouenan et al. 2022). This database includes information on 2.2 million notable individuals born between 3500BC and 2020 (5,500 years of human history) collected from Wikipedia and other secondary sources. As well as dates of birth and death, the data set includes place of birth and other features characterising these individuals (when available). As the authors document, Anglo-Saxon personalities are over-represented due to the bias naturally present in existing projects based on the English edition of Wikipedia. See also Schich et al. (2014) who used the dates of birth and death of a subsample of this data (150,000 notable individuals) to map the evolution of European cultural history during the last 2,000 years.

  • Academich scholars and literati in Medieval and Early Modern Europe (De La Croix, n.d.). Relational database on around 83,000 scholars and literati active in European Academia between 1000 and 1800. As well as place and year of birth and details, it details to which institutions these individuals belonged (universities, scientific academies, etc.). See De La Croix, Scebba, and Zanardello (2025) and De La Croix and Morault (2025) for two applications using social network analysis.

Again, a targeted online search may yield results specific to your interests. Although searching for area and period of study is always useful, many topics are also very well covered: population, education (cappelli2023?), social conflicts (Chambru and Maneuvrier-Hervieu 2022), lighthouses (bogart2022?), sailing routes and wrecks (here), to mention only a few.

Another alternative is to rely on contemporary shapefiles. Physical features (i.e. rivers, coastlines, etc.) are not likely to have changed much, so you easily find appropriate GIS files online or any national agency. Likewise, many historical locations (i.e. settlements, regional entities, etc.) still exist and are contained in contemporary geo-referenced databases.

Alternatively, spatial coordinates can be gathered from GPS receivers, online searches or google map itself. Opening Google Maps and clicking in any point provides this information. Notice though that google maps reports latitude first and longitude second, so the order is switched. This kind of information is, for instance, very important for recording archaeological locations.

Norwegian data

The Kommunedatabasen also has digitised a huge amount of historical information on municipalities (kommuner). You can request shapefiles with the (changing) municipal boundaries from 1880 onwards.

Other additional sources can be found below:

Miscellaneous

Those students with other research interests can choose their dataset on their own. The possibilities are endless. Here are just a few examples:

As mentioned above, I encourage you to find your own dataset.

References

Ahnert, Ruth, and Sebastian E. Ahnert. 2023. Tudor Networks of Power. Oxford University Press.
Ahnert, Ruth, Sebastian E. Ahnert, Jose Cree, and Lotte Fikkers. 2023. “Tudor Networks of Power - Correspondence Network Dataset.” Cliodynamics. Apollo - University of Cambridge Repository. https://doi.org/10.17863/CAM.99562.
Chambru, Cédric, and Paul Maneuvrier-Hervieu. 2022. “Introducing HiSCoD: A New Gateway for the Study of Historical Social Conflict.” Working Paper Series, Department of Economics, University of Zurich 407.
De La Croix, David. n.d. “Scholars and Literati in European Academia Before 1800.” Repertorium Eruditorum Totius Europae 5:35–41. https://doi.org/10.14428/rete.v5i0/global21.
De La Croix, David, and Pauline Morault. 2025. “Winners and Losers from the Protestant Reformation: An Analysis of the Network of European Universities.” Journal of Economic History forthcoming.
De La Croix, David, Rossana Scebba, and Chiara Zanardello. 2025. “Flora, Cosmos, Salvatio: Pre-Modern Academic Institutions and the Spread of Ideas.” CEPR Discussion Papers Series DP20569.
Laouenan, Morgane, Palaash Bhargava, Jean-Benoît Eyméoud, Olivier Gergaud, Guillaume Plique, and Etienne Wasmer. 2022. “A Cross-Verified Database of Notable People, 3500BC-2018AD.” Scientific Data 9 (1): 290. https://doi.org/10.1038/s41597-022-01369-4.
Petram, Lodewijk, Marijn Koolen, Melvin Wevers, and Jelle van Lottum. 2024. “Charting Lives and Careers: Enriched Data about the Dutch East India Company’s Eighteenth-Century European Workforce.” Journal of Open Humanities Data 10. https://doi.org/10.5334/johd.210.
Schich, Maximilian, Chaoming Song, Yong-Yeol Ahn, Alexander Mirsky, Mauro Martino, Albert-László Barabási, and Dirk Helbing. 2014. “A Network Framework of Cultural History.” Science 345 (6196): 558–62.