The Standardized Field Sobriety Tests: An Overview of Their Development, Proper Administration, Interpretation, and the False Positive Rates

11 Nov 2025 10:05 AM | ​Joshua Ott

IAFTC Newsletter. Volume 1. Issue 1. November 11, 2025.

Joshua Ott1

1Caselock, Inc., P.O. Box 285, Lebanon, GA 30146

This is an open-access article under the CC BY-NC-ND license.

Download PDF.

Abstract

The Standardized Field Sobriety Tests (SFSTs) have become the primary screening tool for impaired driving enforcement since their development in the 1970s. While widely accepted in courtrooms across the United States, the actual validation data reveal significant limitations that are often overlooked or unknown by practitioners and legal professionals. This article provides an examination of the SFST validation studies, with particular emphasis on false positive rates that raise important questions about the tests. Analysis of the San Diego study reveals false-positive rates of 37% for Horizontal Gaze Nystagmus (HGN), 52% for Walk and Turn, and 41% for One Leg Stand when administered to drivers with BAC below 0.08 g/dL. The 2007 Robustness of HGN study demonstrated even higher false positive rates when HGN was administered correctly in laboratory conditions (67%), with rates exceeding 90% when stimulus positioning deviated from standardized protocols. Recent research published in JAMA (2023) examining the field sobriety tests’ ability to identify drivers under the influence of Cannabis showed false positive rates of 56% and 37% for Walk and Turn and One Leg Stand, respectively, when administered to placebo-dosed individuals. This article examines the three-phase driving under the influence (DUI) detection process, reviews the historical development and validation of SFSTs, analyzes false positive rates across multiple studies, and provides detailed guidance on proper test administration, interpretation, and common officer errors. Understanding these limitations is essential for forensic toxicologists, expert witnesses, and legal professionals who must accurately interpret SFST results in impaired driving cases.

Introduction

Since their introduction in the late 1970s, the Standardized Field Sobriety Tests (SFSTs) have become the cornerstone of impaired driving enforcement throughout the United States [1–3]. Law enforcement officers routinely administer these tests during DUI investigations, and their results often form the basis for arrest decisions and serve as critical evidence in criminal prosecutions. The tests are presented in courtrooms as scientifically validated tools with impressive accuracy rates: 88% for Horizontal Gaze Nystagmus (HGN), 79% for Walk and Turn, and 83% for One Leg Stand, according to the widely cited San Diego study.

However, a closer examination of the research reveals a more complex picture. The same study that produced these frequently cited accuracy rates also had substantial false positive rates that are rarely discussed in training materials or courtroom testimony. These false positive rates—the percentage of times the tests incorrectly indicate that a person will be at/or above the legal limit of 0.08 g/dL, but the person is actually below the legal limit—have profound implications for how we interpret SFST results.

For members of the International Association of Forensic Toxicology Consultants (IAFTC), understanding the actual capabilities and limitations of the SFSTs is essential. The disconnect between how the SFSTs are portrayed in law enforcement training versus what the study data demonstrates creates significant challenges for scientific testimony and case interpretation.

This article serves multiple purposes for IAFTC members and the broader forensic community. First, it provides a review of the SFST development and validation, tracing these tests from their origins through current research. Second, it examines the false positive rates documented across multiple studies, including recent research that appears to have gone largely unnoticed in the law enforcement community. Third, it offers a detailed analysis of proper test administration procedures and common errors that may further compromise test validity. Finally, it addresses a critical limitation that is often misunderstood: according to the authors of the San Diego Study, these tests have only been validated to predict if a person's BAC is at or above a specific threshold—they have not been validated as indicators of driving impairment or alcohol/drug impairment.

As forensic professionals, there is a responsibility to ensure that scientific evidence is accurately represented and properly interpreted. The SFSTs remain a valuable screening tool for law enforcement, serving their intended purpose of helping officers make Probable Cause determinations during roadside investigations. However, when these tests are presented in court as proof of impairment, or when their limitations are not fully disclosed, we risk compromising the integrity of the forensic sciences and potentially contributing to wrongful convictions.

This article draws from National Highway Traffic Safety Administration (NHTSA) training manuals [4], original validation studies, the SFSTs Field Validation Studies (1995-1998) [5–7], and recent peer-reviewed research to provide IAFTC members with a more complete understanding of what the SFSTs can and cannot tell us about driver impairment. Whether you serve as an expert witness, conduct toxicological analysis, or work in research and policy development, this information is essential for ensuring that field sobriety test evidence is properly evaluated.

I. The Three Phases of DUI Detection

The First phase is “Vehicle in Motion.” Law Enforcement Officers are trained to look for 24 cues to indicate that a driver is possibly impaired. These include failing to maintain lane, driving without headlights, making wide turns, etc. When an officer decides to stop a vehicle, they are then trained to observe how the vehicle stops. The stopping sequence may provide the officer with additional evidence that the driver is possibly impaired. There are times in which the officer may not observe anything during Vehicle in Motion that makes them suspect that the driver is impaired (equipment violations, speeding, roadblock, etc.). During the next phase, the officer may see signs of possible impairment that lead to a DUI arrest.


The second phase is “Personal Contact.” This is probably the most important phase for two reasons. First, this is the only phase that is going to occur during every DUI investigation. Second, it is often the phase that a jury is going to put the most weight in because they are judging if the driver acts and looks the way they expect an intoxicated person to. In this phase, officers are trained to use their senses to identify indicators of possible impairment (bloodshot eyes, soiled clothing, fumbling fingers, open containers, slurred speech, admission of drinking, inconsistent responses, odor of an alcoholic beverage, cover-up odors, etc.). Officers are then trained to observe the driver’s exit from the vehicle. Do they leave the car in gear, use the car for balance, walk with a staggered gait, etc.? It is important to remember that by the end of this phase, an officer likely has probable cause to arrest the driver for DUI. 


The last phase is the “Pre-Arrest Screening.” This includes the Standardized Field Sobriety Tests (SFSTs) and the Preliminary Breath Test (PBT). Officers are trained to administer Horizontal Gaze Nystagmus (HGN), Walk and Turn, and One Leg Stand. After administering the SFSTs, officers can ask the driver to submit to a PBT. At the end of this phase, an officer decides whether they will arrest the driver for DUI based on the standard of Probable Cause. Officers are trained to base this decision on the totality of the circumstances, but in many cases, the arrest decision comes down to the results of the SFSTs.

II. Development of the SFSTs

Starting in 1975, the Southern California Research Institute (SCRI), with funding from the National Highway Traffic Safety Administration (NHTSA), began research studies to determine which roadside tests were the most accurate. Prior to this, officers were using tests, instructions, and clues that were not standardized between officers. This led to problems in court determining how much weight the tests should be given. The goal was to standardize the tests and observations and determine which tests were the most accurate at distinguishing Blood Alcohol Concentrations (BACs) at or above the legal limit. 


SCRI started with six field sobriety tests commonly used throughout the United States. These tests were: One Leg Stand, Finger to Nose, Finger Count, Walk and Turn, Alcohol Gaze Nystagmus (HGN now), and tracing (a paper and pencil exercise). The three most accurate tests are the ones we now know as the SFSTs. The Finger to Nose test is used as part of the Drug Recognition Expert (DRE) program, and the Finger Count is taught as a tool that can be used during Personal Contact [4].


The research included three Standardized Elements for the tests. The first is Standardized Administrative Procedures. Which means there is a required manner in which passes must be conducted for HGN, required instructions for each of the tests, and required demonstrations that must be given for the Walk and Turn and One Leg Stand. The second is Standardized Clues. This means officers are looking for specific clues during each one of the tests. The last is Standardized Criteria. This means that officers must observe a specific thing to count a clue. An example is to count missing heel to toe for the Walk and Turn; a person must miss heel to toe by one-half inch or more. NHTSA emphasizes that the validation only applies when the Standardized Elements are followed.


The Original Research determined how accurate each of the tests was at predicting if a person’s BAC was at or above 0.10 g/dL. When four or more clues were observed, HGN was 77% accurate. When two or more clues were observed on each test, the Walk and Turn was 68% accurate, and the One Leg Stand was 65% accurate. 


There were three field validation studies that were conducted between 1995 and 1998. The Colorado (1995), Florida (1997), and San Diego (1998) validation studies. The primary study that will be addressed is the San Diego study because it is the study that officers currently use to testify as to how accurate the SFSTs are. 


The San Diego study involved 297 drivers, and the mean BAC of those drivers was 0.122 g/dL [5]. Additionally, the mean BAC of the drivers who were arrested was 0.150 g/dL, and the mean BAC of those drivers not arrested was below 0.050 g/dL. Remember that the target BAC is 0.08 g/dL, so the further away from the target that you get, the more likely it is that it will be easier for an officer to make the correct decision. For example, a person who is two times the legal limit would be expected to show more obvious signs of intoxication than someone who is right at the legal limit. That most likely makes it easier for the officer to know an arrest is the correct decision to make. The officers in this study also had access to Preliminary Breath Tests (PBTs).


How accurate are the SFSTs based on the San Diego study? When four or more clues were observed, HGN was 88% accurate. When two or more clues were observed on each test, the Walk and Turn was 79% accurate, and the One Leg Stand was 83% accurate. The overall accuracy when the officers made their arrest decision was 91%. 


To understand exactly what this means, you need to understand what constitutes a “correct” decision and an “incorrect” decision. A “correct” decision was when a person was at or above the BAC level (0.08 g/dL) and the officer arrested them, or if the person was below the BAC level (0.08 g/dL) and the officer released them. An “incorrect” decision was when a person was at or above the BAC level (0.08 g/dL) and the officer released the person (false negative), or the person was below the BAC level (0.08 g/dL) and was arrested (false positive).


Remember, according to the authors of the San Diego Study, these tests have only been validated to predict if a person is at or above a specific BAC. They have not been validated as indicators of driving impairment or alcohol/drug impairment.

III. False Positives

What exactly is a false positive? It is a test that incorrectly indicates a condition exists when it in fact does not. An easy way to think of it is if you went to your doctor and your doctor ran some tests on you. Those tests came back and indicated that you have a disease, but you do not. Those tests would be a false positive. 


What were the false positive rates of the SFSTs from the San Diego study? HGN was 37%, Walk and Turn was 52%, One Leg Stand was 41% and when officers made their arrest decision, it was 28%. So, the Walk and Turn and One Leg Stand are about as statistically accurate as a flip of a coin if the person is below 0.08 g/dL. 

IV. Robustness of the Horizontal Gaze Nystagmus Test

This study was published in 2007 and was funded by NHTSA. Dr. Marceline Burns authored the study [8]. It was in reference to defense attorney arguments that if HGN was administered incorrectly, it would affect the validity of the test. It was conducted in a laboratory setting using volunteer drinkers and experienced officers.


There were (3) elements tested:

  1. Stimulus Speed for Lack of Smooth Pursuit 

    1. Fast (1 second)

    2. Standard (2 seconds)

  2. Stimulus Height 

    1. High (4 inches above eye level)

    2. Standard (2 inches above eye level)

    3. Low (0 inches - at eye level) 

  3. Stimulus Distance

    1. Close (10 inches from the face)

    2. Standard (12-15 inches from the face)

    3. Far (20 inches from the face)


Looking at the results from the times in which HGN was administered correctly, the false positive rate was 67%. Additionally, 65% of the people below a BAC of 0.05 g/dL had four clues or more. There was a person with six clues at a BAC of 0.029 g/dL.


What about the times when the test was not administered correctly?


  • Stimulus higher than the standard – 91% False Positive Rate

  • Stimulus lower than the standard – 79% False Positive Rate

  • Stimulus closer than the standard – 92% False Positive Rate

  • Stimulus farther than the standard– 84% False Positive Rate 


These numbers show that it is imperative that officers position the stimulus correctly, or the false positive rates increase to even higher levels.


How did Dr. Burns address the extremely high false positive rates? She changed the standards to lower the number of reported false positives! In the current training material  (2025 Edition SFST Manual [9]) and in the San Diego Study, four or more clues correlated to a BAC of 0.08 g/dL or more. In this study, four clues correlated to a BAC of 0.03 g/dL or more. This drastically lowered the published false positives. There was no justification given for this changed standard. 


Read the rest of the article in PDF.




Privacy Policy | Terms of Use

  • Home
  • Newsletter
  • The Standardized Field Sobriety Tests: An Overview of Their Development, Proper Administration, Interpretation, and the False Positive Rates
Powered by Wild Apricot Membership Software