Barcodes

Here is an example of a UPC-A barcode, taken from Wikipedia:

A UPC-A barcode has 12 digits. The first digit is something that tells how the numbers are generally used – for example, a particular industry might use a certain number for certain kinds of items. The last twelfth digit is a check digit that can be used to try to tell whether or not the numbers have an error. This check digit is constructed in a certain way.

UPC-A Barcode Check Digit Calculation

The check digit is constructed as follows:

We have the first 11 digits:

$$ABCDEFGHIJK$$

So let \(L\) be the last twelfth digit. We sum the digits in the odd positions and multiply by 3, and sum that with the sum of the digits in the even positions:

$$3\cdot(A+C+E+G+I+K)+(B+D+F+H+J)$$

We take this modulo 10, in other words the remainder of this when divided by 10 (another way to put it is that we just take the digit in the ones place). If this is 0, that is our twelfth digit; if not, subtract this from 10 and that is our twelfth digit.

$$\text{Let}\ S = (3\cdot(A+C+E+G+I+K)+(B+D+F+H+J))$$

\begin{equation}

L=

\begin{cases}

0, & \text{if}\ S \text{ mod } 10 = 0 \\

10 {-} (S \pmod{10}), & \text{otherwise}

\end{cases}

\end{equation}

So the logic is that if all 12 digits are correct, they satisfy the check digit equation:

$$3\cdot(A+C+E+G+I+K)+(B+D+F+H+J)+L \equiv 0 \pmod{10}$$

In other words:

$$ S + L \equiv 0 \pmod{10}$$

If there is an error in the 12th digit, the check digit equation won’t be satisfied. If there is an error in any one single digit among the first 11 digits, then the check digit equation will also not be satisfied! Thus, the check digit equation will detect any single digit error.

To see that a single digit error will cause the check digit equation to not be satisfied, first note that if any of the digits in the even positions (including the check digit since that’s in the 12th position) are off by some amount \(d\), that exact amount \(d\) will be manifested in the left-hand side of the equivalence relation above. Thus, we obtain \(S + d + L \equiv P \pmod{10}\) where if \(P = 0\) then the check digit equation is satisfied. We have \(S + L \equiv 0 \pmod{10} \) from above, the equivalence relation with a correct check digit where the check equation is satisfied, which we can cancel or subtract from the equation we’ve obtained, giving us \(d \equiv P \pmod{10}\). An error in a digit means that \(d\) is an integer where \(d \neq 0\), and it doesn’t make sense for \(d = 10\) or \(d = -10\) since we’re looking at an error in a single digit. Thus, let \(d = \{\pm1,\pm2,\pm3,\pm4,\pm5,\pm6,\pm7,\pm8,\pm9\}\). It’s clear that \(d \pmod{10} \neq 0\) and thus \(P \neq 0\), failing the check digit equation.

What about the digits in the odd positions, whose sum is multiplied by 3, and why multiplied by 3?

Take a digit in one of the odd positions and call it \(O\) for Odd. Let’s say that \(O\) is off by \(d\) where \(d = \{\pm1,\pm2,\pm3,\pm4,\pm5,\pm6,\pm7,\pm8,\pm9\}\) represents a numerical error in \(O\). So our new “\(S\) with an error becomes \(S + 3 \cdot d\). The left-hand side of our check digit equation is \((S + 3 \cdot d) + L\). Set the right-hand side as \(P\) so that:

$$ (S + 3 \cdot d) + L \equiv P \pmod{10}$$

Since we already have \(S + L \equiv 0 \pmod{10}\) (assuming a correct check digit \(L\)), we can cancel or subtract these terms from the above equation and obtain:

$$ 3 \cdot d \equiv P \pmod{10}$$

For any value of \(d = \{\pm1,\pm2,\pm3,\pm4,\pm5,\pm6,\pm7,\pm8,\pm9\}\), we have \(3 \cdot d \pmod{10} \neq 0\). Thus, we have \(P \neq 0\). In other words, having a nonzero \(d\) in an odd position also causes the check digit equation to fail.

So any single digit error will fail the check digit equation. What about two-digit errors? There’s a specific case of two-digit errors – adjacent transpositions, in other words when two digits next to each other are switched by mistake. While it’s odd to think of a computer making this mistake (or any mechanical mistake like that), it makes sense as a common human error when writing down, typing, or verbally reciting a long sequence of numbers. Let’s look at this type of error.

Suppose there’s a sequence of two digits \(ab\) somewhere in our 12-digit barcode. The difference between them is \(d\) so that \(b – a = d\). In our check digit equation calculation, one of the digits is multiplied by \(3\) and the other by \(1\). Let \(a\) be the even digit and \(b\) be the odd digit. Thus, the two digits \(ab\) will contribute to the left-hand side of the check digit equation:

$$1 \cdot a + 3 \cdot b = 1 \cdot a + 3 \cdot (a + d) = 4 \cdot a + 3 \cdot d$$

Now let’s create a transposition error by transposing \(a\) and \(b\). Thus, \(b\) will be multiplied by \(1\) and \(a\) will be multiplied by \(3\), so the contribution to the left-hand side of the check digit equation by these two digits will be:

$$1 \cdot b + 3 \cdot a = 1 \cdot (a + d) + 3 \cdot a = 4 \cdot a + 1 \cdot d$$

Thus, when there is a transposition error, the left-hand side of the check digit equation \(S + L \equiv 0 \pmod{10}\) is changed by \(-2d\), giving us:

$$S + L -2d \equiv P \pmod{10}$$

For any \(d = \{\pm1,\pm2,\pm3,\pm4,\pm6,\pm7,\pm8,\pm9\}\), this will mean that \(P \neq 0\), failing the check digit equation. Thus, for these values of \(d\), a transposition error will be detected. However, for \(d = \pm5\), a transposition error will not be detected. How often will a transposition error have \(d = \pm5\)?

The possible values of \(d\) are \(d = \{\pm1,\pm2,\pm3,\pm4,\pm5,\pm6,\pm7,\pm8,\pm9\}\). Thus, there are \(18\) possible values for \(d\) and for two of those values \(d = \pm5\), a transposition error will not be detected. Thus, for \(\frac{16}{18} = \frac{8}{9}\ = 88.\overline{8}\%\) of cases, a transposition error will be detected and for \(\frac{1}{9} = 11.\overline{1}\%\) of cases, a transposition error will not be detected.

Luhn Algorithm

There’s a slightly different method called the Luhn Algorithm. Go back to our initial 11 digits:

$$ABCDEFGHIJK$$

Instead of multiplying the odd positions by \(3\) and then summing, we start from the right side at position \(K\), multiply the right-most position digit by \(2\) and if that product is \(>10\) then sum its digits. For the next digit to the left at position \(J\) we leave it alone, then go to the next position \(I\) and do the same thing as we did at \(K\) and so on, doing that procedure to every other position until we reach the end on the left. We sum these results:

\begin{equation}

\text{Define } X’ =

\begin{cases}

a+b, & \text{ where if }\ 2X \geq 10 \text{ then } 2X = 10a + b \text{ where both } a,b: \{0..9\} \\

2X, & \text{otherwise}

\end{cases}

\end{equation}

$$ S = K’ + J + I’ + H + G’ + F + E’ + D + C’ + B + A’ $$

Then we take \( 10 -(S \text{ mod } 10) \) and that is our check digit \(L\). When we do a check digit calculation, the \(L\) position digit is added to the above sum:

$$ S + L = (K’ + J + I’ + H + G’ + F + E’ + D + C’ + B + A’) + L $$

$$ = S + 10 – (S \text{ mod } 10) $$

and then we \(\text{ mod } 10\) that result. If that is \(0\), then the check digit calculation is satisfied and we have no single digit errors.

$$ (S + 10 – (S \text{ mod } 10) ) \text{ mod } 10 = [S \text{ mod } 10 + 10 \text{ mod } 10 – S \text{ mod } 10] \text{ mod } 10 $$

$$ = [S \text{ mod } 10 + 0 – S \text{ mod } 10] \text{ mod } 10 = 0 $$

using the distributive property of modulo as an operator. So the check digit equivalence relation is (ending up looking the same as with the UPC-A case):

$$ S + L \equiv 0 \pmod{10}$$

Like we checked with the UPC-A case, what about transposition errors in Luhn’s Algorithm? Let us have two adjacent digits \(“ab”\) and let \(b\) be the digit that goes through the

$$ \text{ if } 2b \geq 10 \text{ , then } b’ = 2b \text{ mod } 10 + 1 \text{ , } $$

$$ \text{ otherwise } b’ = 2b $$

process where the \(+1\) comes from the fact that any \(2b \geq 10\) will be \(10 \leq 2b \leq 18 = 2 \cdot 9\). So if \(“ab”\) are in their correct places, their contribution to the sum in the check digit calculation is:

\begin{equation}

\text{Contribution to sum from } “ab” =

\begin{cases}

a+(2b \text{ mod } 10 + 1), & \text{if }2b \geq 10 \\

a+2b, & \text{otherwise}

\end{cases}

\end{equation}

If there is a transposition error so that we have \(“ba”\) instead, then their contribution to the sum is:

\begin{equation}

\text{Contribution to sum from } “ba” =

\begin{cases}

(2a \text{ mod } 10 + 1)+b, & \text{if }2a \geq 10 \\

2a+b, & \text{otherwise}

\end{cases}

\end{equation}

In the case where both \( 2b < 10 \) and \( 2a < 10 \), then the change in the contribution to the sum has an absolute value of \( |a-b| \), so the sum will change when \( |a-b| \neq 0 \). But if \( |a-b| = 0 \), then we have no transposition error since we have two of the same digits adjacent to each other. Thus, in this case, any transposition error will be detected.

In the case where \( 2b \geq 10 \) and \(2a < 10\):

$$ \text{Change in the contribution to sum due to }“ab”\text{ -> }“ba” = |a+b-(2b \text{ mod } 10 + 1)| $$

In the case where \( 2a \geq 10 \) and \(2b < 10\):

$$ \text{Change in the contribution to sum due to }“ab”\text{ -> }“ba” = |(2a \text{ mod } 10 + 1)-a-b| $$

In the case where \( 2b \geq 10 \) and \( 2b \geq 10 \):

$$ \text{Change in the contribution to sum due to }“ab”\text{ -> }“ba” $$

$$ = |(2a \text{ mod } 10 + 1)-a+b-(2b \text{ mod } 10 + 1)| $$

The cases that concern us are when the result in the absolute values are \(0\) (when \(a \neq b \)), when the transposition errors are undetected. Looking at what happens in the parentheses, the doubling and then digit-adding:

\begin{array} {|c|c|}\hline c & 2c\text{ mod } 10 + 1 \\ \hline 5 & 1 \\ \hline 6 & 3 \\ \hline 7 & 5 \\ \hline 8 & 7 \\ \hline 9 & 9 \\ \hline \end{array}

Going through all the cases, the only times when the result in the absolute values are \(0\) (when \(a \neq b\)) is when “\(ab\)” is \(09\) or \(90\). There are 90 possible transposition errors (\(a\) and \(b\) each can be 0-9 or 10 different values so there are 100 total combinations of “\(ab\)”, minus the 10 combinations when \(a = b\) where no transposition error between the two positions is possible) and we have found 2 cases where a transposition error will not be detected. Thus, for \(\frac{2}{90} = 2.\overline{2}\%\) of cases, a transposition error will not be detected.

An Aside on Mapping

Something to notice here is that in the check calculation procedures, when some calculation is made to a digit like \(3 \cdot c\) or \(2 \cdot c \text { mod } 10 + 1 \text{ if } 2 \cdot c \geq 10\), we want to pay attention to how the digit maps to a result. For example, in the \(3 \cdot c\) case:

\begin{array} {|r|r|}\hline c & 3 \cdot c \\ \hline 0 & 0 \\ \hline 1 & 3 \\ \hline 2 & 6 \\ \hline 3 & 9 \\ \hline 4 & 12 \\ \hline 5 & 15 \\ \hline 6 & 18 \\ \hline 7 & 21 \\ \hline 8 & 24 \\ \hline 9 & 27 \\ \hline \end{array}

It is a trivial example, but you can see a bit more visually why the UPC-A check calculation can detect single digit errors. If the digit is not multiplied by anything during the check calculation, the digit changes by the amount that it’s off by and we know that a digit can be off by values \(d = \{\pm1,\pm2,\pm3,\pm4,\pm5,\pm6,\pm7,\pm8,\pm9\}\). Thus, a digit with an error will never cause the check sum to map to itself, which is our concern. If an error causes the calculation to map to itself, that means we won’t be able to detect the error.

In the \(2 \cdot c \text{ if } 2 \cdot c < 10\ \text{, } 2 \cdot c \text { mod } 10 + 1 \text { if } 2 \cdot c \geq 10\) case: \begin{array} {|c|c|}\hline c & 2 \cdot c \text{ or } 2 \cdot c \text { mod } 10 + 1 \\ \hline 0 & 0 \\ \hline 1 & 2 \\ \hline 2 & 4 \\ \hline 3 & 6 \\ \hline 4 & 8 \\ \hline 5 & 1 \\ \hline 6 & 3 \\ \hline 7 & 5 \\ \hline 8 & 7 \\ \hline 9 & 9 \\ \hline \end{array} Note how \(c\) on the left side initially covers \(0..9\) and then the result on the right side maps to those same \(0..9\) digits without any conflicts while. Thus, we have a one-to-one and onto or in other words bijective mapping from \(0..9\) to \(0..9\) where, importantly, a digit never maps to itself. We know rather trivially that any error in a digit that isn't modified by anything before being contributed to the sum in the check calculation will be detected. The mapping here tells us that any error in a digit that goes through the calculation in the right column, i.e. mapped to the right column, will also reliably change the contribution to the sum and thus will be detected by the check calculation. ISBN

Despite these check digit calculations, neither of them succeed in detecting all adjacent 2-digit transposition errors. In fact, Wikipedia notes that there was a time (in 1958, to be exact, in an article titled An Improved Decimal Redundancy Check by Roger Sisson) when it was believed that no decimal, i.e. base 10 check digit could detect all adjacent 2-digit transposition errors. Thus, ISBN-10 uses a base 11 check digit where \(X\) is used to mean “the digit for our usual 10 in base 10.” In other words, the digits in this base 11 go: \(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, X\).