How many faces, bodies, and animals do people regularly see? How many vertical surfaces do we see compared to oblique or horizontal surfaces? How many different types of objects do people typically encounter, and where in the visual field do these objects land? How do we move our heads and eyes during different types of activities? How does this affect our visual input? What does the world, as experienced by acting human observers, really look like?

Vision is an active process, and in order to fully characterize natural scene statistics it is necessary to characterize what human observers actually fixate in the world. Vision researchers have been fundamentally constrained in the types of insights they can obtain due to limitations of the datasets available for study, including: small dataset size, narrow or otherwise unprincipled sampling of human visual experience, and photographer bias. In addition, many datasets sample broadly from static images on the Internet, but these do not capture dynamic experiences and still reflect aesthetic preferences and other photographic biases.

Our database creation is guided by the following three criteria:

1. First person perspective
picture of cameras used to record data for the VEDB mounted on a headband

The data should be collected from a first-person perspective, with eye tracking. Cameras simulate first-person perspective, but are often aimed in ways that human eyes are not. Thus, it is critical that cameras be head-mounted and that eye tracking data be collected to reveal the specific objects, people, and locations that the observer is looking at.

2. Broad, unbiased sampling
collage of pictures of University of Nevada campus from air, Lewiston Maine waterfall, corn field, close-up of face, grocery store aisle from perspective of someone pushing a cart

The data should reflect a broad and unbiased sampling of locations, tasks, and events from daily life, across a wide variety of observers. Here, we take our cue from recent principled efforts to sample static images of scenes and objects based on natural frequencies, and extend the same logic to first-person video collection.

3. Accessible to all
collage of pictures of University of Nevada campus from air, Lewiston Maine waterfall, corn field, close-up of face, grocery store aisle from perspective of someone pushing a cart

The database should be accessible to all, regardless of technical experience or skill. If one can query a search engine, one should be able to access these data in a usable format.

Head-centered videos and eye tracking data are recorded using three cameras: two for monitoring eye position and one for recording the view of the environment. To eliminate corneal reflections outdoors, a filter constructed of disposable plastic optometrist sunglasses is used. Head tracking is facilitated by rigidly mounting both an inertial measurement unit and a global positioning system to the environment camera. All devices are connected to a laptop, carried in a backpack, that aggregates the data.
images showing head-mounted cameras for simlutaneous first person video recording and recording eye movement to be used to capture data for the VEDB
 
The creation of the VEDB is supported by an NSF EPSCoR Research Infrastructure Improvement Program: Track-2 Focused EPSCoR Collaborations grant (award #1920896).