A Critical Analysis of the Robustness of the Horizontal Gaze Nystagmus (HGN) Test Study and Its True False Positive Rates

24 Dec 2025 9:14 AM | ​Joshua Ott

IAFTC Newsletter. Volume 1. Issue 1. December 24, 2025.

Joshua Ott1

1Caselock, Inc., P.O. Box 285, Lebanon, GA 30146

This is an open-access article under the CC BY-NC-ND license.

Download PDF.

Abstract

The Horizontal Gaze Nystagmus (HGN) test is widely presented in courtrooms as an accurate and valid component of the Standardized Field Sobriety Test (SFST) battery. However, the 2007 Robustness of the Horizontal Gaze Nystagmus Test study, authored by Dr. Marceline Burns and sponsored by the National Highway Traffic Safety Administration (NHTSA), raises significant concerns regarding the test’s accuracy, validity, and false positive rates. This paper critically analyzes the raw data from the study, specifically the stimulus variation experiment, and compares those findings to the conclusions reported by the study’s author. When evaluated using the HGN criterion established in the San Diego Study and still taught in the 2025 edition of the SFST Manual (four or more clues indicating a BAC of 0.08 g/dL or more), the overall false positive rate was 67% when administered correctly, and false positive rates ranged from 79% to 92% when the stimulus position was altered. Despite these findings, the study’s published conclusions assert that HGN is “robust” and unaffected by minor procedural deviations. This paper demonstrates that the reported conclusions were achieved only after the study’s author altered the criterion of a false positive, lowering the BAC threshold from 0.08 g/dL to 0.03 g/dL. The analysis presented here reveals substantial issues with the study and the rate of false positives for HGN, even when administered in accordance with the NHTSA standard.

Introduction

The Horizontal Gaze Nystagmus (HGN) test has long been portrayed in courtrooms as a highly accurate and valid indicator of a person’s BAC being at/or above the legal limit and remains a central component of the Standardized Field Sobriety Test (SFST) battery. Yet the scientific foundation for this confidence warrants renewed scrutiny. The 2007 study, The Robustness of the Horizontal Gaze Nystagmus Test, was authored by Dr. Marceline Burns and funded by the National Highway Traffic Safety Administration (NHTSA). Because Dr. Burns played a central role in developing the SFSTs, including authoring or co-authoring five of the six studies foundational to their use, her conclusions carry significant influence.

However, a close examination of the raw data from the Robustness Study demonstrates a substantial discrepancy between the data and the conclusions published in the report. When administered and interpreted correctly, as established in the San Diego Study, a score of four or more clues indicates a blood alcohol concentration (BAC) of 0.08 g/dl or more. When using this established interpretation criterion, the raw data exhibits an alarmingly high false positive rate. The false positive rates increase further when the stimulus position deviates from NHTSA’s standardized procedures. Despite these findings, the study characterizes HGN as “robust,” a conclusion reached only after redefining a false positive by lowering the threshold from 0.08 g/dL to 0.03 g/dL.

This paper provides a critical analysis of the study’s methodology, its data, and its conclusions. By comparing the study’s raw data to the criterion governing HGN interpretation, this analysis demonstrates that the claimed robustness of HGN is not supported by the underlying data. In doing so, it illuminates significant implications for the admissibility, accuracy, validity, and weight of HGN evidence in impaired-driving cases.

Study Overview

The Robustness Study (Horizontal Gaze Nystagmus Test Study)(1) was published in 2007 and was sponsored by the National Highway Traffic Safety Administration (NHTSA). It was authored by Dr. Marceline Burns. Dr. Burns was one of the investigators who developed the SFSTs, and she was an author or co-author of five out of the six studies (1977(2), 1981(3), Colorado(4), Florida(5), and San Diego(6)) used to develop and validate the SFSTs, which includes the HGN test. When analyzing the Robustness Study, it is important to know and understand that Dr. Burns was intimately familiar with HGN and its scoring criterion.

The study addressed defense attorney arguments that variations from the standardized procedures in HGN administration invalidate the test, so this study examined variations in the administration of the test. There were (3) experiments conducted. The first experiment examined variables in the stimulus, such as stimulus speed during Lack of Smooth Pursuit, elevation of the stimulus throughout the HGN test, and distance of the stimulus from the subject’s face. The second experiment examined the participants’ posture (Standing, sitting, or lying down). The third experiment examined the participants’ vision (Monocular vs. binocular vision). For this paper, the first experiment will be the primary focus. The raw data for the second experiment were not published, so it cannot be analyzed with the same level of scrutiny. The third experiment will also be briefly discussed.

Stimulus Variation Experiment

This was a laboratory experiment that involved volunteers who were dosed to different Blood Alcohol Concentrations (BACs) that were measured using an “AlcoholSensor IV.” Seven experienced officers administered the HGN Test to the participants. 

“A Video/HGN System (EyeDynamics, Inc) was used to make video records of participants’ eyes during examinations. The apparatus uses a small adjustable camera mounted in the right side of goggles that are worn by the participant. The camera transmits an image of the participant’s right eye to a television monitor and VCR which the examiner used to view the right eye. The open left side of the goggles allows the participant’s left eye to be viewed by the examiner.” 

When analyzing the data, it must be understood that the criterion for HGN is that four or more clues indicate a BAC of 0.08 g/dL or more. This standard was established in the San Diego Study and remains in effect as of the 2025 edition of the NHTSA SFST Manual(7).

The first variation tested was the speed of the stimulus. This involved moving the stimulus at both the “standard” speed (moving from the center of the face to one side as far as the eye can in 2 seconds and 2 seconds back to the center) and faster than the “standard” (1 second out to the side and 1 second back to the center) when checking for Lack of Smooth Pursuit. One officer administered the test correctly, and the other moved the stimulus faster than the standard. During this variation, the false positive rate of the HGN test was 76% with an overall correct rate of 44% when the test was administered correctly. (Appendix 1)

The second variation was the elevation of the stimulus. This involved holding the stimulus at the “standard” height (2” above eye level), lower than the “standard” (0” / at eye level), and higher than the “standard” (4” above eye level). During this variation, the false positive rate of the HGN test was 54% with an overall correct rate of 61% when the test was administered correctly. (Appendix 2)

The last variation was the distance of the stimulus from the participant’s face. This involved holding the stimulus at the “standard” distance (12-15”), closer than the “standard” (10”), and further than the “standard” (20”). During this variation, the false positive rate of the HGN test was 69% with an overall correct rate of 47% when the test was administered correctly. (Appendix 3)

Overall, for the entire experiment (all 3 variations combined),the false positive rate of the HGN Test was 67% with an overall correct rate of 50% when the test was administered correctly. (Appendix 4)

HGN Test Accuracy by Stimulus Variation

Stimulus Variation

Standard Condition Tested

False Positive Rate (%)

Overall Correct Rate (%)

Appendix

Speed of Stimulus

2 seconds out / 2 seconds back

76%

44%

Appendix 1

Elevation of Stimulus

2 inches above eye level

54%

61%

Appendix 2

Distance of Stimulus

12–15 inches from face

69%

47%

Appendix 3

Overall (All Variations)

Standardized administration

67%

50%

Appendix 4


What were the results when the test was not administered in accordance with the “standard?”

Stimulus Speed

  • (1 Second) Faster than the “standard” - Overall correct 58% with a false positive rate of 50%. 

  • This is the only variation tested that increased accuracy and decreased false positives.

Stimulus Elevation

  • (0”) Lower than the “Standard” - Overall correct 44% with a false positive rate of 79%.

  • (4”) Higher than the “Standard” - Overall correct 38% with a false positive rate of 91%.

Stimulus Distance

  • (10”) Closer than the “Standard” - Overall correct 29% with a false positive rate of 92%.

  • (20”) Further than the “Standard” - Overall correct 35% with a false positive rate of 84%.

The false positive rates of HGN were very high when the test was administered correctly, but increased notably when the stimulus was not positioned in accordance with the standardized guidelines. What was Dr. Burns’ conclusion, and how did she address the false positives? 

“In conclusion, HGN as used by law enforcement is a robust procedure. The study findings provide no basis for concluding that the validity of HGN is compromised by minor procedural variations.” 

How did Dr. Burns come to this conclusion? By changing the criterion for what would be considered a false positive. Image 1 below is a screenshot from page 15 of the study.


Image 1. Criterion for a “Hit” in the HGN.


The highlighted area shows that four clues were considered a “hit” if the participant’s BAC was 0.03 or higher. By lowering the criterion, it drastically lowered the number of false positives. As can be seen when looking at each one of the tables, very few of the false positives that occurred when applying the established criterion were noted as false positives (denoted by **) by Dr. Burns. It is important to remember that Dr. Burns was the person who trained the officers in the San Diego Study of the updated criterion of HGN scoring. Her statement (from the box above), “the criteria by which scores have been classified as correct, false negative, or false positive as defined in the SFST curriculum appear below,” is not the truth. 

It appears that instead of using the correct criterion and applying it to the data to form her opinions, Dr. Burns altered it to make the data fit her opinions. 

Monocular Vision Experiment

This was also a laboratory experiment and was listed as a preliminary analysis due to the limited number of participants. The participants were required to be functionally one-eyed, so data was only obtained from 7) individuals. The participants were dosed with alcohol and their BACs were measured with an AlcoSensor IV. Two certified DREs independently examined the participants. The false positive rate was 68%. (Appendix 5)

What did Dr. Burns state? 

Because HGN appears to be reduced in a non-functioning eye, if officers were to rely solely on eye signs, they would only increase their false-negative rates, and they might improperly release one-eyed individuals. There is no evidence that HGN signs in such individuals will lead to false arrests.

NHTSA Training Manuals

All references to the Robustness Study were removed from the 2018 SFST, ARIDE, and DRE curricula.(8) (At the time of this writing, the study is still available on the NHTSA website, but is still absent from the NHTSA curricula.) This removal occurred due to a concern that part of the study was conducted in a manner that substantially deviated from the normal protocol for administering and interpreting HGN. (The purpose of the study was to examine deviations from the standardized protocol.) A formal retraction of the study was not recommended. There was no additional information provided as to what the specific issues were, or which experiments of the study were the problem. 

There were no concerns raised with Dr. Burns changing the criterion to alter the number of false positives that were reported in the study. 

The data speaks for itself. These were experienced officers; their correct or incorrect administration of the HGN test was known, the participants’ BACs were known, and the number of clues reported by the officers was known.

Conclusion

The analysis of the Robustness Study reveals a critical issue that has substantial legal and scientific implications: the study’s conclusions are based on an altered definition of a false positive that does not align with the established NHTSA criterion. This change dramatically reduced the number of reported false positives and enabled the author to conclude that HGN was “robust,” despite data showing false-positive rates ranging from 67% when administered correctly to 92% when the stimulus position was altered.

This alteration was not a trivial mistake. Dr. Marceline Burns was the principal or co-author of five of the six foundational SFST development and validation studies that courts repeatedly rely on. If the same researcher who authored the core validation studies subsequently alters the definition of a false positive to align outcomes with a predetermined conclusion, it raises legitimate concerns about the integrity of their prior SFST validation research.

The implications for legal proceedings are significant. Courts routinely rely on the SFST validation studies to support the admissibility and scientific accuracy and validity of HGN evidence. Given these issues, the weight afforded to HGN, and by extension the SFST battery, should be carefully reevaluated.

Acknowledgements

The author acknowledges the use of ChatGPT to assist in drafting and refining the abstract, introduction, and conclusion by improving wording, organization, and clarity based solely on the author’s original manuscript text. All substantive content, analysis, and conclusions are entirely the author’s own.

Conflict of Interest Disclosures

The author is a consultant and expert witness for DUI cases, but has received no funding or compensation for the preparation of this article.

References

[1]Burns M. The Robustness of the Horizontal Gaze Nystagmus Test. Southern California Research Institute; 2007.  

[2]Burns M, Herbert M. Psychophysical Tests for DWI (Driving While Intoxicated) Arrest. U.S. Department of Transportation National Highway Traffic Safety Administration; 1977.

[3]Tharp V, Burns M, Moskowitz H. Development and Field Test of Psychophysical Tests for DWI Arrest. Southern California Research Institute; 1981. 

[4]Burns M, Anderson E. A Colorado Validation Study of the Standardized Field Sobriety Test (SFST) Battery. U.S. Department of Transportation National Highway Traffic Safety Administration; 1995.

[5]Burns M, Dioquino T. A Florida Validation Study of the Standardized Field Sobriety Test (SFST) Battery. United States. National Highway Traffic Safety Administration; 1997. 

[6]Stuster J, Burns M. Validation of the Standardized Field Sobriety Test Battery at BACs Below 0.10 Percent. United States. National Highway Traffic Safety Administration; 1998. 

[7]NHTSA. SFST DWI Detection and Standardized Field Sobriety Test (SFST) Participant and Instructor Manuals. NHTSA; 2025.

[8]DRE Technical Advisory Panel Mid-Year Meeting Minutes March 27, 2018




Privacy Policy | Terms of Use

  • Home
  • Newsletter
  • A Critical Analysis of the Robustness of the Horizontal Gaze Nystagmus (HGN) Test Study and Its True False Positive Rates
Powered by Wild Apricot Membership Software