How Do We See Things?

Content Outline

Light and the Sense of Sight

Light is the energy that our eyes detect when we see. When there's no light, we won't be able to see anything even if there is nothing physically wrong with us. Light occurs in waves, where the distance between one peak to another is called wavelength. Light is visible at 400-700 nm wavelength. (1 nm or nanometer is one-billionth of a meter.) Microwaves and infrared rays have longer wavelengths than visible light; while ultraviolet rays, x-rays and gamma rays have shorter wavelengths than visible light.

Light may differ in color, brightness and purity. Color, or hue, is dependent upon the wavelength of light. Low-frequency light, or light with longer wavelengths, is reddish; while high-frequency light, or light with shorter wavelengths, is bluish. The brightness of light, on the other hand, is measured by the wave's amplitude (or height). High-amplitude light waves are brighter than low-amplitude light waves. Lastly, purity or saturation is measured by the amount of white light added. Pure light has lesser added white light than saturated light. The Color Tree is oftentimes used to demonstrate the different properties of light. Hue moves around the tree; saturation goes outward; and brightness moves upward.

The Different Parts of the Human Eye

We use our eyes in order to see light. The different parts of our eyes work together to enable us to see.

The sclera is the white, outer part of the eye. It maintains the shape of the eye, and protects the inner parts of the eye.

The iris is the colored part of the eye. It is made up of muscle tissues that expand or contract, much like a camera's aperture, to change the size of the pupil, the eye opening where light passes in. The iris adjusts the size of the pupil in response to bright light to allow lesser light to pass through, and vice versa.

The cornea is the clear membrane in front of the eye that bends most of the light for clear image focus.

The lens is the transparent, flexible and disk-like gelatinous material behind the pupil that fine-tunes the image produced by the cornea. It is relatively flat when we are looking at faraway objects, because light is parallel then; and curved when we are looking at close objects, because light is scattered then. One of the reasons why old people have poor eyesight is that the lens becomes less flexible as people age.

The retina is considered as the primary mechanism of sight, because it is where visual sensory receptors are located. It may be compared to the film of a camera, where images are recorded. The retina contains roughly about 126 million visual sensory receptors of two kinds - rods and cones. Rods are thin and long, functions under low illumination, and can only register black and white images. There are around 120 million rods in the retina of a single eye. Cones are short and fat, functions under high illumination, and can register colored images. There are around 6 million cones in the retina of a single eye. The main reason why dogs are said to be color-blind is that they lack cones in their retina. The fovea is a minute area at the center of the retina where most cones reside. It is where we register the best picture of the external world. Because there are too few cones scattered around the fovea, and because there are significantly more rods than cones, we can only register very few cones and white light at the periphery of our eyes.

The image captured by our retina is then transduced (or electrochemically decoded) by bipolar cells, also located at the retina. Ganglion cells then act as afferent nerves that send visual sensory information to the brain. Axons of ganglion cells meet together and make up the optic nerve. Because the optic nerve contains no cons and no rods, it is also known as the Blind Spot. You can try a simple experiment to locate your blind spot. Draw two hearts on a piece of paper around 3 inches apart from each other. Color the left heart black and leave the other plain. Hold the paper in front of your face, touching your nose, while covering your right eye with your right hand. Focus your left eye onto the plain heart, and gradually move the paper backwards until the black heart disappears from the periphery of your left eye. The point of disappearance is the blind spot. Do the same to the other eye.

Optic nerves cross together just behind the nasal passage, identified as the optic chiasm. This crossing over shows that visual sensory information recorded by the left eye is transmitted to the right visual cortex. This crossing over of afferent nerves is also observed in auditory, cutaneous, kinesthetic and vestibular senses.

Visual Processing in the Brain

Visual sensory information is processed in the brain, particularly in the visual cortex, located at the occipital lobe. David Huber and Tolsten Wiesel (1965) won a Nobel Prize for demonstrating that visual stimulus features, such as size, shape, color, movement, lines and angles, are firstly detected in the visual cortex, hence the label of the cortex. Visual sensory information is also shown to move in two different pathways, after being registered in the visual cortex. The "what" pathway is primarily located in the temporal lobe. To illustrate the function of the "what" pathway, consider the case of Dr. P. Dr. P is popularly known as "the man who mistook his wife for a hat," thanks to the same-titled 1985 popular book of famous neurologist Oliver Sacks. Dr. P started having trouble recognizing his students until they speak to him, so he consulted an ophthalmologist, but the ophthalmologist found no direct physical problem with his eyes. Dr. P was subsequently referred to neurologist Oliver Sacks, who correctly diagnosed him of brain damage, particularly in the "what" pathway. Because of this, he had difficulty even discriminating between a glove and a change purse. The "where" pathway, on the other hand, is located at the parietal lobe, where visual stimuli are judged against their relationship with other objects in the visual field, that is, by comparing them from other visible objects three-dimensionally. What is presumably interesting between these two pathways is that they work simultaneously, under same frequency vibration. For us to correctly identify what and where we see things, both the "what" and "where" pathways engage in parallel processing and binding. This means that they work simultaneously and connectively to process visual sensory information.

Color Vision

The Evolutionary Psychology Approach upholds that we have developed color vision due to the need for us to distinguish what foods are ripe and edible, and what types of food are not. Theoretical explanations about color vision were principally derived from early psychological studies, and current anatomical methods prove them to be enduring. Two primary theories of color vision are the Trichromatic Theory and the Opponent-Process Theory.

Trichromatic Theory. Proposed by Thomas Young (1802) and subsequently extended by Hermann von Helmholtz (1852), the trichromatic theory builds upon the assumption that color vision is due to the collective functioning of different receptor systems for colors blue, red and green. Because virtually all single visible light wavelengths can be copied by the combination of blue, red and green wavelengths in varying degrees, it follows that color vision can be evaluated by correctly matching complex and single wavelengths of the same color. Experiments on human color-matching abilities showed that Dichromats (having only two functioning cone systems), especially those with malfunctioning green cone systems, are fairly common. They are unnecessarily, but commonly, referred to as Color Blind.
Opponent-Process Theory. The Opponent-Process Theory, proposed by German physiologist Ewald Hering (1878), basically states that afterimage phenomenon reflects that we see color in complementary pairs, red-green and blue-yellow. Afterimage phenomenon happens when you stare too long at green and then see red when you look away. Although somewhat different from the trichromatic theory, it is now widely believed that the two theories work in different levels, that is, complementary pairs result from the transduction of trichromatic cones in the ganglion cells.

Perceiving Visual Dimensions

Visual dimensions include shape, depth, motion and constancy, and we perceive them in different ways:

Perceiving Shape. We perceive shapes through the help of contours and patterns. A contour is where a sudden change in brightness occurs. For example, we perceive cylinders because we observe that the left and right sides gradually diminish in brightness. Patterns, on the other hand, are used for organizing perceptions.

Gestalt Psychology underscores that the whole is not equal to the sum of its parts. (Note: "Gestalt" is German word for "form" or "configuration".) Common principles of gestalt psychology as applied in shape perception are figure-ground relationship, closure, proximity and similarity. Figure-Ground Relationship means that we perceive shape by identifying what is figure and what is ground, and by comparing them with each other. Closure is filling the spaces of disconnected and incomplete figures. This means that an incomplete circle will still be perceived as a circle because we naturally close the missing gaps together to form a shape. Proximity and Similarity are principles of grouping. Proximity is grouping by closeness, while similarity is grouping by sameness. For example, a spiral is formed by grouping circular lines in proximity, or a flower may be formed by similarity in a cross stitching.

Perceiving Depth. We perceive depth through the help of binocular and monocular cues. Binocular cues come from the disparity between the left and the right eyes. Because the left and the right eyes naturally record different visual sensory information due to their location, the brain then processes two different images to provide meaning on the depth of visual objects. Stereograms are adaptations of binocular cues. Because of individual variations of eye placement, that is, some eyes are too widely separated and some are too narrowly close, it takes a lot of adjustments to see images from stereograms. On the other hand, monocular cues are those provided by a single eye. They are also known as pictorial cues because artists apply them to mimic a 3-dimensional image to a 2-dimensional platform. This is why we perceive depth even if the canvass is flat. Monocular cues, such as familiar size, height in field of view, linear perspective, overlap, shading and texture gradient, allow us to perceive depth with a single eye. Familiar size comes from experience. We know for certain that buildings are taller than cars from experience. Height in Field of View means that objects placed in higher position are perceived to be farther. This is the reason why we perceive the size of the moon differently with different locations. Because we do not have familiar experience over its actual size, we perceive the moon as farther and smaller when above us than when it is near to the horizon. Linear Perspective means that far objects take less retina space, so we perceive converging lines as farther than parallel lines. Overlap means that the concealing object is closer than the concealed object. Shading makes use of lighting and object location. Texture Gradient means that objects with denser and finer texture are farther than objects with lighter and thicker texture. Cartoonists oftentimes use texture gradient to provide depth in their drawings.

Perceiving Motion. Baylor (2001) remarks, "The dumber the animal, the smarter the retina," because frogs and other simple animals are said to detect motion only through the use of their retinas. Because humans are more complex and more specialized, we use extensive environmental cues to perceive motion. Unlike frogs, we use a number of sensory receptors - visual, auditory and others, particularly kinesthetic and vestibular senses - to perceive external movement. However, a significant portion of sensory information needed for humans to perceive motion comes from the visual system. Disneyland uses the concept of how the human eye detects motion in order to produce the illusion of apparent movement. Apparent movement comes in two forms - stroboscopic and aftereffect. Stroboscopic motion is achieved by rapidly stimulating the different parts of the retina. Movement Aftereffect occurs as a result of watching continuous movement, whereby another surface moves in opposite direction.

[See also: How Do We Sense Movement and Motion?]

Perceiving Constancy. Even if a building looks smaller when far away, we still know that it's tall. Even if the door is open and we can only see its thin vertical portion, we still know it's a flat rectangle. Even if it's dark, we still know that leaves are often green. We perceive constancy despite varying sensations through the help of experience and memory.

[See also: Proprioceptive Feedback]

Visual Illusions

Visual illusions are discrepancies between reality and visual perceptions. There are more than 200 visual illusions already discovered, and most are due to perceptual constancy. Some examples of visual illusions are:

The Muller-Lyer Illusion, where a line looks shorter if enclosed with arrow heads pointing inwards than with arrow heads pointing outwards;
The Horizontal-Vertical Illusion, where a vertical line looks longer than a same-length horizontal line;
The Ponzo Illusion, where lines nearer to the converging lines look longer than those farther.
The illusion of the Devil's Tuning Fork, where the fork can be perceived as having two or three thongs, depending upon the angle of vision; and,
The Bush Illusion, where Bush seems smiling when he is actually not, because his rather serious mouth weren't turned upside-down along with the rest of his face.