1. In-class coursework: Qualitative dimensions

Let’s practice what we have learned during the first session on extracting information from qualitative dimensions. Let’s focus on a completely different data set that the one we have used so far. The Petitioning in Early Modern England data set consists of 2,847 petitions filed in England between 1573 and 1799 (Waddell and Howard 2022). As well as the text itself, it includes information on date, petitioners, topic, administrative responses, etc. Petitions were a crucial mode of communication between the ‘rulers’ and the ‘ruled’, so they provide a vital source for illuminating the concerns of the people, from noblemen to paupers. The data is hosted in this repository.

The Humble Petition of Jock of Braid Scotland, 1648.

Download it into a folder of your choice and read it into R. Remember to set the working directory and load the necessary packages.
Explore how the data set looks like. How many observations does it contain? What each observations refers to (unit of analysis)? What kind of information does it report about each observation?
Were the petitions contained in this data set filed all over England or in particular counties?
What were the two most common topics for filing a petition. Draw a bar plot to visualise this information. Which fraction of all the petitions these two topics represent? Use also the information of subtopic to shed more light on this issue.
Imagine that you are especially interested in “poor relief”. Can you provide more information about who was filing these petitions and the type of response they got from the Royal authorities.

Solution:

The downloaded files are contained in a folder with a very long name that you can perhaps simplify it. Once you have done so and set the working directory in R, you can use list.files() to tell you what folders and files are located in a given folder.^[Typing the command with nothing in parentheses R will show you the contents of the working directory. By contrast, list.files("/data)" will list the files in the folder named data.

In my case, I put the files in the folder data-assign, so I can read in using read_excel() (remember to load the package readxl).

Show code

rm(list=ls())
library(tidyverse)
library(readxl)
data <- read_excel("data-assign/petitions/data/tpop_petitions_petitioners_v1_202208.xlsx")

Let’s explore how the data set looks like.

Show code

data

# A tibble: 1,728 × 19
   petition_id county     year  date      topic subtopic named_petrs subscribers
         <dbl> <chr>      <chr> <chr>     <chr> <chr>          <dbl>       <dbl>
 1           1 Derbyshire 1632  after 25… pate… anti               1           7
 2           2 Derbyshire 1639  20 April… cott… pro                1           3
 3           3 Derbyshire 1649  13 March… poor… reimbur…           1           0
 4           4 Derbyshire 1652  1652      rates pro                1           0
 5           5 Derbyshire 1655  24 April… liti… pro                1           0
 6           6 Derbyshire 1655  24 April… liti… pro                1          17
 7           7 Derbyshire 1665  4 April … offi… neglect            0          20
 8           8 Derbyshire 1680  5 Octobe… other prison …           0           6
 9           9 Derbyshire 1680  5 Octobe… poor… pro                1           0
10          10 Derbyshire 1680  5 Octobe… liti… anti               1           0
# ℹ 1,718 more rows
# ℹ 11 more variables: petition_type <chr>, petition_gender <chr>,
#   sub_gender <chr>, response_cat <chr>, petitioner <chr>, abstract <chr>,
#   repository <chr>, collection <chr>, reference <chr>, ll_img <chr>,
#   bho_transcribed <chr>

As indicated above, this data frame contains 1,728 rows, that is observations. Each row refers to petitions filed in England during the period of study. The unit of analysis is therefore these petitions. There are 19 columns, meaning that there 19 pieces of information about these petitions. Given that we are focusing on qualitative dimensions, we will explore things like the locations where these petitions were filed (county), what they refer to (topic and subtopic), the type of petition (petition_type), the name and gender of the petitioner (petition_gender) or the response they received (response_cat). The data set also includes a textual description of the petition itself (abstract).

Regarding the particular questions indicated above, you can list the places or origin of these petitions by using count() on the variable county.

Show code

data |> 
  count(county)

# A tibble: 5 × 2
  county             n
  <chr>          <int>
1 Cheshire         613
2 Derbyshire        94
3 Staffordshire    239
4 Westminster      422
5 Worcestershire   360

Doing the same by topic and sorting the results in descending order gives you the most important topics behind the petitions. You could also do it by subtopic but the results are not that clearcut.

Show code

data |>
  count(topic, sort = TRUE)

# A tibble: 13 × 2
   topic                  n
   <chr>              <int>
 1 litigation           474
 2 poor relief          290
 3 rates                174
 4 paternity            133
 5 cottage              129
 6 employment           121
 7 officeholding         93
 8 other                 85
 9 military relief       58
10 alehouse              53
11 imprisoned debtors    44
12 charitable brief      37
13 dissenting worship    37

We can have a look at the distribution of petitions by displaying it into a graph.

Show code

data |>
  ggplot(aes(x = topic)) +
  geom_bar() +
  coord_flip()

Let’s now compute the fraction that the two most common categories represent (ouf all the petitions). As we know, count() generates a data frame with two columns, one listing the categories present in the field we are exploring and the other, named n, indicating the number of observations belonging to each category. Building on that, we can create another field that computes that fraction by dividing the number of cases (n) by the total number of observations (sum(n)). Instead of summing up the values of the most common categories ourselfs, we utilise the function cumsum() to do it for us and report the cumulative frequency. Lastly, the last line ask R to round the fields fraction and cum_sum up to 2 decimal places.¹

¹ across() within mutate() is a very useful short cut for implementing the same operation to a set of different columns.

Show code

data |>
  count(topic, sort = TRUE) |>
  mutate(fraction = n/sum(n),
         cum_sum = cumsum(fraction)) |>
  mutate(across(c(fraction, cum_sum), round, 2))

# A tibble: 13 × 4
   topic                  n fraction cum_sum
   <chr>              <int>    <dbl>   <dbl>
 1 litigation           474     0.27    0.27
 2 poor relief          290     0.17    0.44
 3 rates                174     0.1     0.54
 4 paternity            133     0.08    0.62
 5 cottage              129     0.07    0.69
 6 employment           121     0.07    0.76
 7 officeholding         93     0.05    0.82
 8 other                 85     0.05    0.87
 9 military relief       58     0.03    0.9 
10 alehouse              53     0.03    0.93
11 imprisoned debtors    44     0.03    0.96
12 charitable brief      37     0.02    0.98
13 dissenting worship    37     0.02    1

The topics “litigation” and “poor relief” therefore constitute 44 per cent of the total number of petitions.

To know more about what is behind these topics, you can use subtopic but focusing only on particular “topics” (categories within topic). The following for instance reports what are the most common subtopics for “poor relief” (you can explore this further focusing on other topics).

Show code

data |>
  filter(topic=="poor relief") |>
  count(subtopic, sort = TRUE)

# A tibble: 15 × 2
   subtopic                             n
   <chr>                            <int>
 1 pro                                174
 2 <NA>                                65
 3 removal                             23
 4 anti                                10
 5 pauper apprenticeship                4
 6 reimbursement                        4
 7 pro: housing                         2
 8 anti: for kin support                1
 9 anti: pauper apprenticeship          1
10 pauper apprenticeship: exception     1
11 pro: county                          1
12 pro: pauper apprenticeship           1
13 pro: rentable housing                1
14 pro: settlement                      1
15 settlement certificate               1

It seems a significant majority of petitions under this topic were in favour of “poor relief” (174 + 2). Only 12 petitions seem to be anti-“poor relief”.

Regarding who was filing these petitions, the field petition_type shows that they could be filed individually (the majority), by multiple persons or collectively. I am not expert on these sources, so we should make sure that we understand what these categories actually mean.

Show code

data |>
  filter(topic=="poor relief") |>
  count(petition_type)

# A tibble: 5 × 2
  petition_type            n
  <chr>                <int>
1 collective              45
2 collective on behalf     6
3 multiple                13
4 multiple on behalf       2
5 single                 224

We could continue this exploration focusing perhaps on “single” petitions and exploring the gender of the petitioner (petition_gender) or the response they received (response_cat). I leave that to you.

References

Waddell, Brodie, and Sharon Howard. 2022. “The Power of Petitioning in Early Modern England, 1573-1799.” Zenodo. https://zenodo.org/records/7027693.