Many electoral computer systems exist primarily to store and make use of data. Electoral rolls of voters, electronic voting systems, election results systems and staff and materials databases are all based primarily on data. Ensuring the reliability of data is crucially important. A computer system that relies on data is only as good as the data it contains.
There are several measures that can be taken to ensure the reliability of data used in electoral computer systems.
Use reliable data sources
The first step is to use reliable collection methods. This means that data will be legally obtained from reliable sources, preferably 'primary' sources rather than 'secondary' sources. For example, personal elector details will be more accurate if obtained directly from the electors themselves, and not from indirect sources such as acquaintances or other organisations' databases of questionable quality. The most accurate election results data will be obtained directly from the officers in charge of counting centres, and not from media reports, election observers or political parties.
Data capture methods
The next step to consider is the method by which data is 'captured' from the source. Data can be captured in a variety of ways: on a paper form (which could be handwritten, marked with computer readable marks or typed), by telephone (after which the data is usually written down or typed into a computer by an operator), by face to face inquiry (when the data may again be written down or typed into a computer by a staff member), by clients directly entering data by on the internet or an 'electronic kiosk' (using 'on line' electronic forms), by an electronic voting device, and so on.
Some forms of data capture are more reliable than others. Handwritten forms are probably the most liable to error, as handwriting can often be hard to read or decipher. To minimise the difficulty in reading handwriting, persons completing forms can be encouraged to write clearly in capital letters in blue or black ink. Clear writing can also be encouraged by printing forms with 'guide lines' that are designed to make users write each letter or number in a separate box on the form. If it is possible to pre-print any information known about the client on a form, this will help reduce the amount of handwriting needed.
Where data is received verbally by an operator or staff member, appropriate training and procedures can ensure that the operator faithfully captures the correct information. For example, information can be read back to the client to check that it is correct, and the spelling of words checked if appropriate.
Where data is captured using optical mark recognition, the most useful instructions to users will ben those that are clear and unambiguous, and the methods employed simple and intuitive.
Forms that include optical mark recognition devices such as barcodes can be used to simplify data entry and raise accuracy levels. Barcodes can be used to identify the type of form used, where the form was obtained, what the unique number of the form is, and so on. Where forms contain pre-printed client information, barcodes can be used to allow the computer to accurately capture the client's identity at the data-entry stage.
Data captured electronically, where the data is typed by the client directly into a computer interface such as the internet, can be more reliable than entering data from handwritten forms or data taken verbally, as the client can be expected to know exactly how their data should appear. However, such data is only as reliable as the client is accurate, and the quality of data supplied by clients may be variable.
Training of data capture staff
Staff will need to be trained in techniques designed to optimise accurate input and to ensure a safe working environment. For example, regular breaks will prevent eye strain and fatigue. Furniture and computer equipment can be situated to ensure good posture and sound ergonomic practices. Distractions can be minimised or preferably removed. For example, permitting staff to engage in discussion while entering data can sacrifice input accuracy.
Data verification
One of the best ways to ensure the accuracy of data is to apply data verification techniques. The most common data verification technique (where data is being typed into a computer from a paper record) is to enter every piece of data twice, using two different operators for each piece of data. The results of the two data entries are compared by computer. Any variation is highlighted, and a supervisor is required to make any appropriate correction. This technique usually gives very high accuracy rates.
Double-keying of data can also be used to identify data-entry operators who are not achieving a high level of accuracy. Where under-performing operators are identified, this may indicate that more training is needed or that the operator is not suited to that kind of work.
Data can also be verified by entering the data once, and requiring another officer, perhaps a supervisor, recheck the data on screen or on print-outs and confirm that it is correct, or make any necessary corrections.
Using either of the above techniques, it is desirable that data is entered once by one person and then either re-entered or rechecked by a different person. This is because people can make systematic errors, with the likelihood that they will make the same mistakes repeatedly. However, it is less likely that two different people will make the same systematic errors, so a second person is more likely to pick up the mistakes made by someone else.
It is also possible that form design can lead to users or data-entry operators making systematic errors. If significant numbers of similar errors are discovered regularly when a form's data is being recorded, it may be that the design of the form is at fault. Redesigning the form may help to lower error rates in this case.
Data can also be verified using computerized checks built into the data capture process. For example, a database containing residential addresses can include a list of all valid addresses. The system can be programmed to ensure that only valid addresses can be accepted when data is being entered. Such a verification technique will not necessarily ensure that the correct address has been entered, but it will ensure that all the recorded addresses are indeed real.
Similarly, arithmetic checks can be built into data-entry systems involving entry of numbers. For example, a data entry form could require users to enter a total for a sum. The data entry program can be set up to require the total to be entered by the operator and can cross-check that total against the sum of the numbers making up that total. If there is a discrepancy, the operator will be prompted to correct the data.
Arithmetic or logic tests can also be used to verify data entry. For example, if an operator is entering polling place voting data, the system can be programmed to query any result that shows more votes counted at the polling place than there are electors registered to vote at that place. Trends can be also calculated by computer systems and any results that vary from the trend by an unusual amount can be identified and queried.
Ensuring reliability of data after it has been captured
Once data has been entered into a computer system, it is important that it be stored securely and maintained, as well as used in a manner in which its integrity is not compromised. These issues are addressed under Ensuring Availability of Data.