HomeHomeUpUpSearchSearchE-mailMail
NEW

Depth cues and how they related to stereo pairs

In this page I use again the work flow from looking at a 3D scene with one's eyes/brain and determine how this is being changed when looking at 2D pictures that generate a 3D image. A glossary of terms can be found here.

Depth cues in 3D scene

The following depth cues of a real 3D scenes are recognized (Rock [1995], page 53 - 89):
Cue
Bi/monocular
Strength
Related aspects
What can be perceived
Accuracy range
Interaction with
Retinal disparity
Binocular
powerful

Relative distance/depth
up to 200m in real live (theoretical 1.3 km)
Perspective, Accommodation, Convergence
Oculomotor cues

Convergence
Binocular
Size better estimated than depth
Absolute distance
up to several metres
Accommodation, Retinal disparity
Accommodation
Monocular
weak
Weak source for depth
Absolute distance up to a metre
Convergence, Retinal disparity
Motion parallax
Monocular
important

Depth of scene
Convergence
Pictorial cues



Interposition
Monocular powerful
assuming familiar shapes
Relative distance
any range

Shadow
Monocular important
attached shadows more powerful than cast shadows. Also light from above is assumed
Relative position

any range

Perspective
Monocular powerful, can over power retinal disparity
linear, foreshortening, size, detail, aerial (haziness) and texture gradient perspective
Relative distance, information on planes any range Retinal disparity
Familiar size
Monocular inconclusive
If size is known from former experience
Absolute distance
until angular size becomes too small

Constancy in perception

Beside the above depth cues, the human (eye-brain combination) is working along the principle of size constancy, so an object is perceived with more or less the same size; even thought its angular size changes when changing the distance.
Location constancy is another important aspect of the brain; so even if one moves one head or one's eye, stationary objects are perceived to stay at the same location. I assume this location constancy also makes sure that the two images of both eyes can be mapped in some static way to determine the binocular depth cues.
Beside the above two ones, some other constancy exist: shape, orientation, lightness and depth (see Rock, [1995], page 22 - 51 and 62).

Depth experience, how actually?

I understand that perceiving the outside world (3D scene) is mainly done by the brain and not the retina of the eyes.  I also perceive an image in my brain that is stable (egocentric instead of oculocentric), so in some way that image is build up/constructed (by instance by taking location constancy and after images into account).
As disparity is evaluated in the brain; how important is for instance the fovea itself (beside that it makes a proper alignment of the two eye image possible and thus improves the accurate determination of disparity)? Thus how important are the eye's properties (fovea size, eye diameter, lens system, etc.) actually in factual depth perception?
Looking at the cues and constancies: they are not really that related to the eyes themselves (except perhaps accommodation cue): even retinal disparity is not evaluated in the eye itself.

To be honest I don't yet know how to ask precisely my question as I have still too many unknowns, so I hope that some people have some nice link/article/book/URL on this subject? So any help is welcome. Thanks.

Local stereopsis

I had hoped that the horopter and the Panum's area would help me to explain this, but nor sure yet. These are valid for a certain eye fixation (local stereopsis). The Panum's area is quite small (fusion stereopsis experienced around 10' to 1 degrees depending on the peripheral angle (S. Steinman [2000], page 202)), but quantitative (patent) stereopsis has a larger range.
In any case one might need multiple fixations to perceive most 3D scene, and these multiple fixations will build up the full mental 3D image (global stereopsis).
We also don't experience absolute depth, but only relative depth from the horopter, so this also will need multiple fixations to get the depth of the whole scene.

Important to note that the maximum disparity for stereopsis near the fovea is quiet small (~10'), so if one wants to experience stereopsis from a disparity that is larger, one should not look (fixate) with the fovea to that location, otherwise diplopia happens!
Stereopsis (latent/patent/fusion)
<from Virtual environments and advanced interface design, B. Woodrow, page 154, [1995], I assume that the 'Maximum angular disparity' is the uncrossed disparity>

This link and this link look to be a great resource on this subject!

Remark: What is strange is that the high resolution of the fovea seems not to influence the 'maximum angular disparity'. One would expect some relation...

I have been looking at different aspects of stereopsis:
The above assumptions, give the following graph for local stereopsis (don't see yet too much in it yet, as there are some unknown in above assumptions, I am verifying and working on this!).
Stereopsis (latent/patent/fusion)

Differences between viewing a 3D scene and a pair of 2D pictures

The question is of course how depth is experienced when using a stereo pair and how this relates to the depth experience of the actual 3D scene. At this moment, I recognize three types of situations (again following the work flow from real experience to picture taking to observing 2D pictures):
  1. normal viewing of a 3D scene with two eyes.
  2. making a stereo pair with two lenses
  3. viewing a stereo pair with the eyes
(A and C are of course the most important ones as these are the human experiences; B is just a step to get to C).

These three have quiet different properties (below is concentrating on still stereo photography, I can understand it might be different for stereo video). In the below fixation distance (for eyes) with vergence distance (for a camera lenses) are somewhat equivalent terms:
  1. Variable focused distance, small eye Depth of Field (DoF), linked (with focused distance) fixation distance:
    • The focused distance and fixation distance are (normally) the same/liked.
    • The eye DoF is quite small in most (all) situations.
    • The fixation distance normally represents a convergence angle (positive fixation distance)
    • The range where a human can experience depth is quite small: patent/quantitative stereopsis (without changing the eye's fixation distance [aka horopter distance]).
      It is interesting to see that the eye DoF range is quite comparable to the patent stereopsis range (perhaps obvious when knowing evolution theory;-) Also important is that the eye/brain combination rejects things outside quantitative stereopsis (it makes them almost invisible).
    • The stereo window distance (zero deviation) is normally at the fixation distance (as the eye needs to put the deviation at zero at that location to make proper interpretations by the brain possible).
    • There is thus a lot of window violation (everything in front of the fixation is closer then the fixation distance aka window distance,) luckily the window itself is very blurry with our eyes!
    • The eyes have to scan over the whole scene (using different focused/fixations distance) to get the full scene.
  2. Camera looking at the whole scene: fixed focused distance, large camera DoF, fixed vergence distance
    • The focused distance is fixed at a point which the photographer finds the most interesting.
    • In stereo photography it seems we choice a large camera DoF (to allow to see a deep scene).
      Would DoF in video be smaller than in still photography?
    • The vergence distance normally represents a convergence angle (positive vergence distance), but it can be also divergence angle (negative vergence distance).
      Would vergence distance be the same as the focused distance for stereo video? See also this video of the converging lenses for Cameron's film Avatar.
    • As the camera DoF is large the focused distance is not that important
    • The stereo window distance (zero deviation) is (by default) at the vergence distance.
    • No scanning of camera lenses possible over the scene (so it stays a fixed focused/vergence).
  3. Fixed focused distance, small eye DoF, variable fixation distance
    • As the focus distance is the distance to the screen, every thing on that screen is in focus. As the camera DoF of the 2D picture was large, the eye sees much more sharp than in real live.
    • The small eye DoF is not important here as the 2D picture are all in one plane
    • The fixation distance normally represents a convergence angle (positive vergence distance), but it can be also divergence angle (negative vergence distance)
    • As the eye sees much more sharp, this causes the strain on the eyes (more conflicts are explicitly visible, while not really visible with the real 3D scene). The eye/brain combination has more difficulty in rejecting non fusible parts of the scene (like with far/near objects, where keystoning distortions might have happened in step B due to con/divergence), as the camera's DoF was large.
    • Further more the focus and the fixation distance are different (not linked), and thus also not really as humans are used to in real live.
    • The stereo window distance is determined by the two 2D pictures' relative shift.
    • The eyes have to scan over the whole scene (using constant focused but different fixation distance) to get the full scene.
      With video the eyes don't really have time to wander?
So there are a few difference between these situations A and C. I think that these difference (and a few others: like motion parallax) can quiet well explain the issues around stereo viewing. Some (all?) of these differences might explain the stress in the eyes in certain cases.
Here are some differences made more explicit when looking at a real 3D scene and using a stereo pair:
  • different retinal disparity. For instance in 3D scene one's eyes fixate/converge on the object and this will change the angle one sees the background (Glickman, B., pers. comm. [2009]), this change in angle is not experienced in a static stereo pair)
  • the viewer's accommodation/focus is not depending on the depth in the 3D scene when looking at the 2D pictures
  • focus areas of  2D pictures are fixed, while this is variable with eyes looking in the 3D scene.
  • the brain can block parts of the scene that have too much disparity: only one eye image is used. This is not happening when admiring a stereo pair (with video less a problem due to less attention to main object).
  • motion parallax is present with 3D scene as a human's head is never static (not the case for in static stereo pair, although if one moves one's head slightly with a stereo pair, the impression of motion parallax is there).
  • no motion parallax in stereo pair when converging eyes to a certain part in the pair (present when looking at 3D scene).
  • Scanning the scene by the eyes does not result in changes of stereo pair experience, but it would happen with a 3D scene.
  • there could be misalignments due to bad stereo picture taking (like rotation, vertical misalignment, size, pincushion/barrel distortion, etc.)
  • difference due to the presentation method used for the stereo viewing method.(key stoning with cross viewing, color changes with anaglyphs, etc.)
  • different 3D image experience due to hard picture boundaries (stereo window or the size patent stereopsis)
  • different importances of cues/constancies when viewing a pair, see below
  • etc.? Let me know.

Three types of 3D images when looking at a stereo pairs

The following types of 3D images (as perceived in the brain) can be seen by a viewer of a stereo pair (and one can switch between them by proper attention/concentration of the viewer). These types are proposed by the author of these pages.
This section is at this moment primarily about cross free viewing (as I personally can experience type 1 and type 2), but should work for any viewing method (will be updated in the future).

Type 1: Geometric 3D image

When cross viewing, the geometric 3D image can be seen between the screen and the eyes. It is based on the geometry of the eye locations and the homologous points on the 2D pictures. The convergence/accommodation (oculomotor) cue is important here. This type might be a form of proximal mode (see Rock, [1995], page 44 and 45).
See in below picture of cross free viewing:

Cross viewing of
                  geometric 3D image
Cross free viewing

When just learning cross free viewing, this was the way how I perceived the 3D image.

Something similar for the geometric 3D image when parallel free viewing:
Parallel
                      viewing of geometric 3D image
Parallel free viewing

Some modeling of cross and parallel free viewing

As the 3D image is geometric, it is easy to do calculations on:

3D image point distance = EyeSeparation*ViewDistance/(EyeSeparation-HomologousPointSeparation)

Vergence angle of one eye = atan((EyeSeparation-HomologousPointSeparation)/ViewDistance)

With:
HomolousPointSeparation is measured from the left point to the right point (so negative for cross viewing)
Vergence angle: positive -> convergence angle and negative -> divergence angle

Type 2: Picture size 3D image

In this case one can see the 3D image close to the location of the screen. I assume this is due to size constancy (and perhaps also the familiar size cue) of the size of the 2D pictures.
This type might be a form of salient mode (see Rock, [1995], page 44 and 45).

After having three months experience with cross free viewing, this became the way how I perceive a 3D image.

Type 3: Immersion 3D image

One can perceive the 3D image as a fully immersed 3D image. I assume this is due to size constancy (and perhaps also the familiar size cue) of the actual object size in the 2D pictures.
This type might be a form of salient mode (see Rock, [1995], page 44 and 45).

I personally don't see a 3D image in this way (yet?). But other people do (pers. comm. Berlin, L. [2009]).

Subtype 3a: Ortho 3D image

An immersion 3D image can also produce an ortho stereo 3D image. When ortho stereo; the Immersion, Geometric 3D image and 3D scene are the same.

Conclusion

The above all is just a draft and things are just forming in the head;-).
The statement of Rock ([1995], page 45) on salient and proximal modes is important:
If this description of dual modes (salient and proximal) of perception is correct, it is no wonder that experiments can lead to varying or contradictory results. Subjects are caught in a potential conflict. If they match on the basis of constancy, they fail to take note of one facet of perception (visual angle); if they match on the basis of visual angle (which is very difficult to do), they fail to take account of the most central facet, namely, constancy. Such an explanation sheds a different light on the evidence of individual differences in constancy experiments and the development of constancy.

The above matches almost the confusion we see when talking about the stereo experience of people

Acknowledgments

I would like to thank the following people for their help and constructive feedback: Larry Berlin, Chris Evans, Bill Glickman, Jorn Lang, Scott Steinman and all other unmentioned people. Any remaining errors in methodology or results are my responsibility of course!!! If you want to provide constructive feedback, let me know.

Disclaimer and Copyright
HomeHomeUpUpSearchSearchE-mailMail

Major content related changes: Jan. 30, 2010