Craig, Guest poster on Venezuelan Electoral statistics!
Seeing from comments, and files after files that I have received, it seems that the possible electoral fraud model in Venezuela has awakened quite a lot of interest. Although I can grasp intuitively some of the key principles on the kind of stuff that a statistician should look into to verify a possible fraud, I am far from an expert to judge the multiple models offered. Though I thank very much those that thought highly enough of me to send me their models. I wish I could have been able to discuss things.
For the past week or so I have been playing on my very own model with a friend in California, who has absolutely no vested interest in Venezuela and just liked the challenge of it all. Well, he has done 98% of the work, since 1% comes from explaining him the basics of the Venezuelan problem and 1% loading up in blogger the files, which is not a piece of cake, let me tell you.
Still, I do like it a lot, even if I do not comprehend everything, and the reason why I am posting it is that it is presented in terms simple enough that even somebody that knows little or nothing of stats can still manage to understand the kind of patterns that must be compared. The model has been deliberately simplified for various reasons: ease of calculation, ease of understanding, illustration of the method. I have not posted all the graphs as it would have burdened the page too much, but the links to the different graphs can be opened for those willing to read the whole thing. Craig has even set up an address where you can write him and exchange comments if you wish to do so, or give him information that can improve his model and make it directly applicable to the Venezuelan case.
Let's read Craig proposal. And let me thank him very much for his effort.
PS: the italics are mine as well as some [comments]
An Explanation of Voting Machine Matching in Venezuela
Craig W. Craig4VZ@yahoo.com
August 24, 2004
The referendum in Venezuela has many people wondering about how many matching vote machines is realistic. Many are claiming that the probability of machines having matching vote counts is similar to the odds of winning the lottery. In reality, the odds of having matching machines are much more similar to finding people at a party that have the same birthday.
In order to simplify the calculations, I have reduced the number of voters per machine to 150. This reduction was done to aid in the calculation and enable people to use commonly available tools to verify the computation such as Microsoft Excel or OpenOffice.org Calc programs. In doing so, the chance of a match between machines will be overstated compared to machines where the number voters is 400 or more per machine.
When looking at voters on a single machine, the distribution of SI Votes to NO Votes will follow a distribution called a binomial. It is the same distribution that is used for calculating the chance of getting different results when flipping a coin. The formula for calculating the probability for the binomial is provided in Formula 1. The binomial distribution allows you to calculate odds if the coin is weighted to come up heads or tails more frequently than a standard ‘fair’ coin. The same calculation can be used to determine how frequently different total vote counts will occur when more people are in favor of one side than the other.
If you use the above formula to calculate the chances of tossing a coin 10 times, you can find the likelihood that you will have 10 heads and 0 tails , 9 heads and 1 tail, 8 heads and 2 tails, and so on, all the way to zero heads and 10 tails. Your expected distribution would look like Chart 1.
Chart 1. Distribution for a coin tossed 10 times. [expected graph for a binomial distribution]
The formula for determining the chance of a match between two voting machines when only two voting machines exist is the same as finding the chance of flipping a coin 10 times twice row and having the same result both times. If you flipped the first coin 10 times and the second coin 11 times, the chances of having the same number of ‘heads’ occurrences is a little more rare. The chance of having matching instances is provided below as Formula 2. When the two voting machines have the same number of voters, like flipping the coin the same number of times, the formula simplifies to Formula 3. When the two voting machines have different number of voters, separate calculations need to be performed for both the SI and NO Votes. .
In the formula, p represents the probability of type of vote for which you are checking the match. If you are checking for matching SI Vote counts, p represents the population percentage that voted for SI. If checking for matching NO Vote counts, p would be calculated again using the population percentage for a NO Vote. The number N represents the total number of Voters that voted on a machine with the smaller total number of voters. The number M represents the total number of Voters that voted on the larger total number of voters machine.
In the simplified case where both machines have the same number of voters, the resulting number of matching machines will be larger than for matching machines with different total voter counts. Chart 2A below represents the number of expected matching occurrences when we have 500 mesas with 2 machines, 500 mesas with 3 machines and 500 mesas with 4 machines and a 30% SI – 70% NO votes. Chart 2B provides the same calculation for machines with 800 voters and a 40% SI – 60% NO ratio of voters.
Chart 2A. [barely perceived yellow line, dramatic visual representation on how odds drop fast]
The number of matching occurrences is dependent on the split in the ratio of SI to NO voters. Table 1 computes the number of expected matches for 500 mesas with 2 machines, another 500 mesas with 3 machines and another 500 mesas with 4 machines based on the ratio of SI Voters to NO Voters listed.
Multiplying by the ratio of the number of mesas to 500 will calculate the number of expected matches for a selected number of mesas. i.e. If 1000 Mesas exist with only 2 machines and the Vote ratio is 40% SI to 60% No, the expected value is (1000/500)*23.49 = 46.98. If we only have 100 Mesas with 2 machines we would expect (100/500)*23.49 = 4.69.
If we have 2000 Mesas with a 4 Machines and a 40% SI to 60% NO vote, we would expect 0.28 occurrences of matching SI votes.
Table 1. Expected number of vote count matches with 500 mesas
|SI Vote to NO Vote Ratio||Match 2 of 2 Machines||Match 3 of 3 Machines||Match 4 of 4 Machines|
|10% to 90%||38.47||3.41||0.32|
|20% to 80%||28.79||1.91||0.13|
|30% to 70%||25.12||1.46||0.09|
|40% to 60%||23.49||1.27||0.07|
|50% to 50%||23.01||1.27||0.07|
|60% to 40%||23.49||1.27||0.07|
|70% to 30%||25.12||1.46||0.09|
|80% to 20%||28.79||1.91||0.13|
|90% to 10%||38.47||3.41||0.32|
Of important note is that Table 1 represents matching all machines at a given Mesa. If you desire to calculate the chance of matching 2 machines when you have 3 machines, the formula is more complex. Formula 4 is used to calculate one combination of three machines, Machine A matching Machine B but not matching Machine C. The probability of machine C not having the desired outcome is called C’ and is 1 – probability of C having the desired outcome. To find the probability of two of the three machines matching, you would also calculate the combinations of AB’C and A’BC. Luckily, when we have a large number of voters per machine, the calculation simplifies to approximately 3 times the amount of 2 Matching 2 Machines listing in Formula 2.
Because we are using multiple machines with the same number of voters, the number of matching SI and NO votes is symmetric. When one machine has a slightly larger or slightly smaller voter count, the number matches between the two sides differentiates. The greater the ratio between SI and NO votes, the bigger the difference in expected matches. Charts 3, 4, 5 & 6 show the matching vote counts for SI and NO with a 40/60 split and a 20/80 split with 150 and 160 voters in a mesa with 2 machines.
What can be seen from the Charts 3, 4, 5 & 6 is that as the electorate begins to favor one choice greatly, the difference in the expected number of matches between SI and NO increases. Additionally, greater increases in the number matches will occur for the vote with the smaller share of the population voting for it. The difference in the number of matching occurrences decreases between SI and NO as the split between SI and NO ratio decreases.
Chart 4 [where the spread starts to show]
Unfortunately, the most difficult part in calculating the real expected number of matching occurrences is obtaining real data on the distribution of voters and number of voters per machine. Fortunately, as the data above represents, significant numbers of duplicating scores are expected. When the number of matching machines occurs as 4 machines out of 4 machines matching, the number of expected results is very low.
When we compare the projected results from the model against the reported results of the referendum, we will expect to see numerous sites with 2 or more machines matching. The occurrence of two machines matching will actually be fairly common. The likelihood of 4 machines or more matching will be rare unless several sites with more than 4 machines. Even with 500 mesas (sites) having 5 voting machines, we would still expect less than one occurrence of 4 matching machines at a single mesa.