This approach relies heavily on advances by machine vision researchers, who have made remarkable strides in last few decades in recognizing stationary and moving objects and their properties.
It’s the same vein of work that led to Google’s self-driving cars, face recognition software used on Facebook and Picasa, and consumer electronics like Microsoft’s Kinect.
When it works well, machine vision can detect objects and people — call them nouns — that are on the other side of the camera’s lens. But to figure out what these nouns are doing, or are allowed to do, you need the computer science equivalent of verbs.
…that’s where Oltramari and Lebiere have built on the work of other Carnegie Mellon researchers to create what they call a “cognitive engine” that can understand the rules by which nouns and verbs are allowed to interact. Their cognitive engine incorporates research, called activity forecasting, …which tries to understand what humans will do by calculating which physical trajectories are most likely.
They say their software “models the effect of the physical environment on the choice of human actions.”
Both projects are components of Carnegie Mellon’s Mind’s Eye architecture, a DARPA-created project that aims to develop smart cameras for machine-based visual intelligence. Predicts Oltramari: “This work should support human operators and automatize video-surveillance, both in military and civil applications.”