Understanding eye tracking requires an insight into:
- How the eye works?
- Why the eye moves?
- What do we study in eye tracking?
- How do eye trackers work?
1. How does the eye work?
Our eyes have many similarities with how a photo camera works: Light reflected from an object or a scene travels into our eyes through a lens. This lens concentrates and projects the light on to a light sensitive surface located on the back of a closed chamber. However, unlike a camera, the light sensitive surface (which in the eye is called the retina, see below) is not equally sensitive everywhere.
|The retina is a light sensitive structure inside of the eye responsible for transforming light into signals, which are later converted into an image by the visual cortex in the brain. The fovea is a section of the retina that contains a high density of both kinds of light receptor cells found in the eye, i.e. Cone and Rod cells. Rod cells, which are mostly located in the outer retina, have low spatial resolution, support vision in low light conditions, do not discriminate colors, are sensitive to object movement and are responsible for the peripheral vision. Cone cells, which are densely packed within the central visual field, function best in bright light, process acute images and discriminate colors.|
Through evolution, our eyes have been designed to work in both dark and light environments as well as providing both detail and quick changes in what we see. This has led to certain compromises, e.g., that we can only see details clearly in a limited part of our visual field (in the eye called the foveal area). The larger part of our visual field (the peripheral area) is better adapted to low light vision, and to detect movements and register contrasts between colors and shapes. The image produced by this area is blurry and less colorful. Between these two areas we find a region of transition called the para-foveal area, in which the image becomes gradually more blurry as we move from the fovea into the peripheral area (see below).
||This figure is a schematic representation of the human visual field. The main area that is in focus, F, corresponds to the area where we direct our gaze to – the foveal area. As is illustrated in this image, the foveal area is not circular. Hence, the area in focus will have a slightly irregular shape as well. Within the rest of the visual field (the para‐foveal and peripheral areas) the image we perceive is blurry and thus harder to interpret and discriminate in high detail.|
The cause of the differences in our visual field is the two different kinds of light receptor cells available in the eye, i.e. the rods and the cone cells. About 94% of the receptor cells in the eye are rods. As mentioned previously, the peripheral area of the retina is not very good at registering color and providing a sharp image of the world. This is because this area is mostly covered by rods. Rods do not require much light in order to work, but do, on the other hand, only provide a blurred and colorless image of our surroundings. For more detailed and clear vision, our eyes are also equipped with light receptor cells called cones which make up about 6% of the total number of light receptor cells in our eyes. Cones are, in the human eye, most often available in three different varieties; one that registers blue colors, one that registers green and one that registers red. While being efficient in providing a clear picture, the cones do require much more light in order to function. Hence, when we look at things when it’s dark around us, we lose the ability to see color and use mainly information registered by rods, providing us with a grey scale image. Cones are mostly found within the fovea where they are tightly packed in order to provide as clear an image as possible.
2. Why do our eyes move?
Often in articles on human vision and eye tracking accuracy, we come across measurements expressed in visual angle, e.g. the size of the foveal area is estimated to be 1-2° visual angle, or remote eye trackers have an accuracy between 1-0.5°. (Note: A smaller angle means less inaccuracy.) When we point a flashlight on a wall in a dark room we can observe that the light forms a projection on the wall. The size and shape of this projection is related to the size of the light source and the distance that you stand from the wall. The reason the distance affects the size of the projection is because the light disperses at a specific angle from the source. Hence, if we wish to specify the size of the projection area using a standard size measure (e.g. cm or cm2) of that flashlight we would always have to specify the distance at which it was measured. However, if we use the angle of dispersion as size indicator we can easily calculate the projection size for multiple distances using simple trigonometry. The same rationality applies to our visual field as images are formed through the projection of light on the retina, i.e. our eye works as a reversed flashlight that absorbs light instead of emitting it.
The human visual field spans about 220 degrees and is, as previously mentioned, divided in 3 main regions: foveal, parafoveal, and peripheral region. We primarily register visual data through the foveal region which constitutes less than 8% of the visual field.
Even though this represents only a small part of our field of vision, the information registered through the foveal region constitutes 50% of what is sent to the brain through our optic nerve. Our peripheral vision has a very poor acuity, which is illustrated above, and is only good for picking up movements and contrasts. Thus when we move our eyes to focus on a specific region of an image or object, we are essentially placing the foveal region of the eye on top of the area which is currently within main focus of the lens in our eye. This means that we are consequently maximizing our visual processing resources on that particular area of the visual field which also has the best image due to the optic characteristics of the eye. By letting the foveal region register the image, the brain get the highest resolution possible for the image of the interesting area to process as well as the most amount of data registered by the eye about that area. Hence, the brain is able to present the best possible image of the area we find interesting to us.
|Rather than perceiving an object or a scene as a whole we fixate on relevant features that attract our visual attention, and construct the scene in our visual cortex using the information acquired during those fixations.|
Eye movements have 3 main functions which are considered important when we process visual information:
- Place the information that interests us on the fovea. To do this, fixations and saccades are used. A fixation is the pause of the eye movement on a specific area of the visual field; and saccades the rapid movements between fixations.
- Keep the image stationary on the retina in spite of movements of the object or one’s head. This movement is commonly called a smooth pursuit.
- Prevent stationary objects from fading perceptually. Movements used for this are called microsaccades, tremors and drift.
3. What do we study when we use eye tracking data?
What is visual attention?
Whenever we look at the world, we consciously or unconsciously focus only on a fraction of the total information that we could potentially process, in other words we perform a perceptual selection process called attention. Visually this is most commonly done by moving our eyes from one place of the visual field to another; this process is often referred to as a change in overt attention – our gaze follows our attention shift. Even though we prefer to move our eyes to shift our attention, we are also capable to move our mind’s attention to the peripheral areas of our visual field without eye movements. This mechanism is called covert attention. Although we can use these two mechanisms separately they most frequently occur together. An example is when we are looking at a city landscape and we first use our covert attention to detect a shape or movement in our visual field that appears to be interesting and use our peripheral vision to roughly identify what it is. We then direct our gaze to that location allowing our brain to access more detailed information. Thus a shift of our overall attention is commonly initiated by our covert attention quickly followed by a shift of our overt attention and the corresponding eye movements.
How fast is human visual perception?
In addition to only having a very limited sharp field of vision, our eyes are also fairly slow at registering changes in images compared to the update frequency of a modern computer screen. Research has shown that the retina needs about 80 ms of seeing a new image before that image is registered in normal light conditions. This doesn’t mean that we consciously have noticed any changes – only that the eye has registered a change. The ability to register an image is also dependent on the light intensity of that image. This can be compared with a photographic camera where a short shutter speed, in a badly lit environment results in a dark and blurred image, where hardly anything can be seen. However, if taking an image of something which is very well lit, e.g. a window, the shutter speed can be very short without this problem occurring. In addition to needing time to register an image, the eye also requires time for the image to disappear from the retina. This is also dependent on the light intensity. One example of this is when being exposed to a very bright light such as a camera flash where the image of the flash stays on the retina long after the flashing has ended.
In addition to the light sensitivity of the eye, how fast we perceive something we are looking at also depends on what we are observing. When reading in normal light conditions, it has been observed that most people only need between 50-60 ms of seeing a word in order to perceive it. However, when looking at, e.g., a picture people need to see it for more than 150 ms before being able to interpret what they are seeing.
Most eye tracking studies aim to identify and analyze patterns of visual attention of individuals when performing specific tasks (e.g. reading, searching, scanning an image, driving, etc.). In these studies eye movements are typically analyzed in terms of fixations and saccades. During each saccade visual acuity is suppressed and, as a result, we are unable to see at all. We perceive the world visually only through fixations. The brain virtually integrates the visual images that we acquire through successive fixations into a visual scene or object (see the image above). Furthermore we are only able to combine features into an accurate perception when we fixate and focus our attention on them. The more complicated, confusing or interesting those features are the longer we need to process them and, consequently, more time is spent fixating on them. In most cases we can only perceive and interpret something clearly when we fixate on an object or are very close to it. This eye–mind relationship is what makes it possible to use eye movement measurements to tell something about human behavior.
4. How do eye trackers work?
The process of eye tracking is, from a technical point of view, divided into two different parts: registering the eye movements and presenting them to the user in a meaningful way. While the eye tracker records the eye movements sample by sample, the software running on the computer is responsible for interpreting the fixations within the data. This blogpost aims at explaining the principles used in Tobii Eye Trackers in order to track the participant’s eye movements.
|Pupil Centre Corneal Reflection technique (PCCR) A light source is used to cause reflection patterns on the cornea and pupil of the test person. A camera will then be used to capture an image of the eye. The direction of the gaze is then calculated using the angles and distances.|
How are the eye movements tracked?
Eye tracking has long been known and used as a method to study the visual attention of individuals. There are several different techniques to detect and track the movements of the eyes. However, when it comes to remote, non‐intrusive, eye tracking the most commonly used technique is Pupil Centre Corneal Reflection (PCCR). The basic concept is to use a light source to illuminate the eye causing highly visible reflections, and a camera to capture an image of the eye showing these reflections. The image captured by the camera is then used to identify the reflection of the light source on the cornea (glint) and in the pupil. We are then able to calculate a vector formed by the angle between the cornea and pupil reflections – the direction of this vector, combined with other geometrical features of the reflections, will then be used to calculate the gaze direction. The Tobii Eye Trackers are an improved version of the traditional PCCR remote eye tracking technology (US Patent US7,572,008). Near infrared illumination is used to create the reflection patterns on the cornea and pupil of the eye of a user and two image sensors are used to capture images of the eyes and the reflection patterns. Advanced image processing algorithms and a physiological 3D model of the eye are then used to estimate the position of the eye in space and the point of gaze with high accuracy.
|Eye during Brigh pupil eye tracking. Above to the left is a Hispanic or Caucasian eye. Above to the right is an Asian eye.
|Eye during Dark pupil eye tracking. Above to the left is a Hispanic or Caucasian eye. Above to the right is an Asian eye.|
What are Dark and Bright Pupil eye tracking?
There are two different illumination setups that can be used with PCCR eye tracking: bright pupil eye tracking, where an illuminator is placed close to the optical axis of the imaging device, which causes the pupil to appear lit up (this is the same phenomenon that causes red eyes in photos); and dark pupil eye tracking where the illuminator is placed away from the optical axis causing the pupil to appear darker than the iris.
There are different factors that can affect the pupil detection during remote eye tracking when using each one of these two techniques. For example, when using the bright pupil method, factors that affect the size of the pupil, such as age and environmental light, may have an impact on trackability of the eye. Ethnicity is also another factor that affects the bright/dark pupil response: For Hispanics and Caucasians the bright pupil method works very well. However, the method has proven to be less suitable when eye tracking Asians for whom the dark pupil method provides better trackability.
Tobii Eye Trackers of the T/X Series use both bright and dark pupil methods to calculate the gaze position while the earlier 50‐series only used dark pupil eye tracking. Hence, the Tobii T/X Series Eye Trackers are able to deal with larger variations in experimental conditions and ethnicity than an eye tracker using only one of the techniques described above. All participants are initially subjected to both the bright and dark pupil methods and the method that is found to provide the highest accuracy is chosen for the actual testing. During a recording the Tobii TX‐Series Eye Trackers do not change between bright and dark pupil tracking unless conditions change in a way that have a significantly negative impact on trackability. If that happens, the Tobii Eye Trackers conduct a new test where both methods are used simultaneously in order to determine which method is the most suitable for the new conditions and continue the recording using only the selected method.
Wow! that was a long post. Eye tracking is pretty complex! Thanks Tobii!
Read other FAQs about eye tracking on our Objective Digtial website.