Saturday, July 16, 2016

Profiling and the Reverend Thomas Bayes, a Lemma

Profiling can be an important tool in fighting crime. It has been the basis of several television crime drama series, including one named Profiler. And yet, we often express outrage when the police are accused of profiling. So what is the difference between Good Profiling and Bad Profiling? The answer is provided by a Presbyterian minister: the Reverend Thomas Bayes.
Here is an example of Good Profiling: for 16 years a bomber has been planting small bombs all around the city, hitting movie theaters, phone booths and other public areas. The police are at a loss, not sure what type of person they are looking for. Calling in a profiler,they find that most likely the perpetrator is unmarried, foreign, self-educated, and in his 50s. The question being answered is: given a type of crime that has been committed, what is the type of person most likely to have been responsible? This helps the police narrow their search.
Examples of bad profiling, unfortunately, abound. We need to ensure safe air travel, and the people who have most often presented security risks on airplanes recently have been of middle-eastern appearance. So when screening passengers, we pull out every person with a middle-eastern appearance, and submit them to a full-body search. Here there is a different question being asked: given that I have encountered a particular type of person, what is the probability that they have committed (or are intending to commit) a particular type of crime?
The two questions are inverses:

  1. Given crime: what type of person?
  2. Given type of person: is crime likely?

Let's do the numbers

We will use an imaginary city named Theoville, with demographics roughly the same as Baltimore. Theoville has 100,000 people. 63,700 of whom are black including 30,000 black men.
There are 1,000 people in jail or prison in Theoville. It's a bit of a stretch, but we can use this to assume that if we meet a random person from Theoville, there is a 1,000/100,000 = 1% chance that the person is a criminal. (It's actually a big stretch, because of the 650 people in Theoville city jail, 90%, or 585, are pre-trial--which is to say, we assume that they are innocent. The 90% pre-trial figure is the case for Baltimore city jail: see Baltimore Behind Bars, a Justice Policy Institute Report.)
Of the 1,000 people incarcerated, 770 are black men. So, if we meet a random person who is incarcerated, the probability that he is a black man is 770/1,000 = 77%. Thus people who work around the kind of Theovillians who get incarcerated see a large proportion of black male criminals. Going back to our two types of profiling questions, this is a type 1 question:

  1. Given that the person I have encountered is a criminal (actually, incarcerated), what is the probability that the person is a black male? Answer: 77%.
For nerds, we can represent this symbolically. If B is the event "Black man", and C is the event "Criminal" (again, using numbers incarcerated as a substitute for criminal), then the probability that a random Theovillian is a black man given that he is a criminal can be written:
  1. P(B|C) {read as "probability of the event B, given the event C} = 77%
Now, if I am wondering the streets of Theoville and encounter a black man, what is the probability that he is a criminal?
It is very easy to make the mistake of saying that it is 77%. I've known very knowledgeable people make that mistake. But remember, this is a different question from the one we asked above. We are now asking a type 2 profiling question.
  1. Given that I have encountered a black man, what is the probability that he is a criminal?

Enter Bayes

This is where the Reverend Bayes comes in, because he did the mathematics to help us answer this question. We are now looking for P(C|B): the probability of the event C (meeting a criminal) given the event B (meeting a black man). Bayes Theorem states:
P(C|B) =
P(B|C)P(C)
P(B)
where P(C) is the probability of event C: encountering a criminal among the population of Theoville, and P(B) is the probability of event B: encountering a black man in Theoville. We already know that P(B|C) is 77%, and that P(C) is 1%. P(B)  is 30,000/100,000 = 30%, giving:

P(C|B) =
77% x 1%
 = 2.6% 
30%
There's an even easier way to calculate this: there are 770 black male criminals in Theoville, so given that we just met one of the 30,000 black men in Theoville, the probability that he is a criminal is 770/30,000 = 2.6%.
Bottom line: even though 77% of criminals are black men in Theoville, the probability that a black man I've met on the street is a criminal is 2.6%. Profiling with a type 1 question: good odds. Profiling with a type 2 question: poor odds. And if we make the mistake of confusing the two types of questions and their answers, we treat a whole group of people very unfairly.

Why a lemma?

A lemma is a subsidiary proposition introduced in proving some other theorem. It's a result that I will probably be using repeatedly. The first use will be my next post, explaining why we could argue that God has turned his back on the United States.

No comments:

Post a Comment