SATW: 3D image understanding for urban scenes

3D image understanding for urban scenes

Lay summary

Die wichtigste Datenquelle für großflächige Geodaten sind Bilder und Distanzbilder, die mit Kameras und Laserscannern entweder aus der Luft oder mit Mobile-Mapping Systemen aufgenommen wurden. Um diese Daten automatisch in virtuelle 3D Modelle umzuwandeln muss ein Computerprogramm zwei Aufgaben lösen: einerseits muss die 3D Geometrie der aufgenommenen Szene rekonstruiert werden; andererseits müssen die Daten interpretiert, also in semantische Objekte (Strassen, Gebaeude, Bäume, etc.) strukturiert werden. Bei der Verarbeitung wurde bisher nicht genügend darauf Rücksicht genommen, dass diese beiden Aufgaben eng zusammenhängen - verschiedene Objekttypen habe unterschiedliche geometrische Eigenschaften, und andersherum sind Form und Ausdehnung eines Objekts wichtige Hinweise auf die Objektklasse.

Ziel des Projekts ist es, Methoden zu entwickeln, die die geometrische Rekonstruktion und die semantische Interpretation gemeinsam in einem Guss lösen, um automatisch vollständige, korrekte und interpretierte 3D Modelle zu gewinnen. Technisch gesehen muss dazu das vorhandene Vorwissen über das Zusammenspiel von (geometrischer) Form und (semantischer) Funktion formalisiert werden. In strukturierten Umgebungen wie eben Städten gibt es eine grosse Menge an solchem Wissen, und selbst einfach Beziehungen können nur ausgenutzt werden wenn man Geometrie und Semantik gemeinsam betrachtet - zum Beispiel dass Wände vertikal sind, dass Strassen glatte Oberflächen sind, Vegetationsbereiche aber nicht, dass Dächer höher liegen aus der Boden, etc.

Methoden zur integrierten 3D Modellierung und Szeneninterpretation koennten die automatische Kartierung entscheidend verbessern und damit einen wichtigen Schritt darstellen hin zum visionären, weltweiten '3D virtual habitat'.

Abstract

The topic of the proposed project is automatic generation of virtual 3D city models, starting from aerial and terrestrial image data. Digital models of our environment form the basis for geographic information systems (GIS) and are required for a wide range of tasks in planning, construction, navigation, etc. Such topographic data has been used for a long time by specialists and authorities, and is nowadays becoming ubiquitous with internet cartography. To this day, 3D city models are generated manually or at most semi-automatically, which is costly and does not scale.The main data source for large-scale mapping are images and range images acquired with cameras and laser scanners, either from aerial platforms or from mobile mapping systems on the ground. To convert that sensor data to a virtual 3D model one needs to solve two computer vision problems: on the one hand the 3D geometry of the imaged scene must be reconstructed; on the other hand the data must be interpreted, meaning that it must be structured into semantically meaningful entities (buildings, roads, trees, etc.). A crucial point, which has long been known but is still not properly accounted for, is that the two tasks are not independent - objects of different semantic classes have different geometric properties, and vice versa the geometric shape and extent is an important cue about the semantic object class of a surface.The aim of the proposed project is to develop computer vision techniques able to jointly solve 3D reconstruction and semantic labelling, to automatically generate complete and accurate, interpreted 3D models. Automatic 3D modelling and image understanding have been core topics of computer vision and remote sensing research over the past decades. Both field have now reached some maturity, and a central scientific question is how to integrate the two developments. Specifically, the project aims to develop a probabilistic framework for integrated 3D image understanding in urban scenes, based on aerial nadir, aerial oblique, and ground-level images. Such a model will deliver at the same time better 3D geometry, by allowing one to include category-specific priors for the surfaces’ shape, orientation and layout; and better segmentation of the scene into semantic object classes, aided by the underlying 3D shape and layout.The technical core contribution of the project shall be a probabilistic graphical model that formalises the a-priori assumptions about 3D urban scenes, together with the necessary inference tools to estimate the geometric 3D shape of a scene and the semantic classes of its individual elements in a single, joint optimisation procedure. A large amount of a-priori domain knowledge exists for structured scenes such as cities, but even simple relations can only be leveraged if both geometry and semantic interpretation are considered - e.g. that building walls are vertical, that roads are smooth surfaces whereas vegetation canopies are not, that roofs are usually higher than the ground, etc.We are convinced that integrated 3D modelling and scene understanding will significantly enhance the power of automatic mapping and constitutes an important step towards the visionary “3D virtual habitat”.

Last updated:08.11.2021

SNSF
Project funding (Div. I-III)
Original data source 157101 i

Information Technology
Mathematics, Natural- and Engineering Sciences;Engineering Sciences

2 People

Prof.Konrad Schindler

Prof.Marc Pollefeys

We help you find the perfect fit.

Lay summary

Abstract