High definition mapping (HDM) is a form of mapping that uses advanced technologies such as LiDAR, satellite imagery, aerial photographs and photogrammetry to create detailed, accurate 3D maps or models of an area. HDM is used for a variety of purposes, from creating virtual tours of a city to helping autonomous vehicles safely navigate roads.
First, I’d like to share a little about myself and the relevant experiences. I worked with high definition mapping.
Most recently I worked on developing a home robot (Astro) at Amazon Lab126, building out the embedded mapping and infrastructure framework.
Prior to that, I worked at AutoX — a self driving vehicle company, mostly a generalist systems engineer. I worked on building and scale out the infrastructure to handle massive amount of data.
In this article, we’ll focus primarily working with point cloud data in terms of infrastructure, compute, data structures, and some algorithms to compute and transform the data.
High Definition Maps
High Definition Maps contains 3D data, produced from include the raw point data (“point cloud”) as well as processed derivatives such as contours and surfaces (DEMs).
HD maps differ from conventional maps in that it is possible to use HD maps for high accuracy and precision localization (e.g. knowing where you are on the road on a centimeter level).
Navigation Data Standard
A standardized format for automotive-grade navigation databases, jointly developed by automobile manufacturers and suppliers. NDS is an association registered in Germany. Members are automotive OEMs, map data providers, and navigation device/application providers.
NDS uses the SQLite Database File Format. An NDS database can consist of several product databases, and each product database may be divided further into update regions. This concept supports a flexible and consistent versioning concept for NDS databases and makes it possible to integrate databases from different database suppliers into one NDS database. The inner structure of databases complying with NDS is further characterized by building blocks, levels and the content itself.
Definitions
- High Definition (HD) Map have extremely high precision at centimeter-level.
- Online Transaction Processing (OLTP) is the traditional model for enterprise data processing. OLTP databases focus on transactions involving the input, update, and retrieval of data.
- Online Analytical Processing (OLAP) data warehouses focus on queries that collate, summarize, and analyze its contents. Sample data mining techniques in OLAP process include applying statistics, artificial intelligence, and machine learning techniques to find previously unknown or undiscovered relationships in the data.
HD Map Data Sources
This article will not go into too much detail about the capture and setup of the equipment.
Lidar
Lidar is an active light source, similar to how a radar works.
At the raw data level, lidar outputs a (rotation, distance, intensity) value — which can be transformed to produce x, y, z, points, and a reflective intensity value.
Camera
Camera is passive sensing, and it’s possible to capture in more detail than lidar.
At the raw data level, camera outputs a matrix of color values, e.g. Red, Green, Blue.
Sensor Fusion
Typically data is collected in raw form, and can be fused together in real time, applying matrix transformation based on sensor calibration, or fused together in post.
These data can be collected and stitched together to a map.
Infrastructure
What the infrastructure looks like depends on many factors.
- Strategic business goal. How important is mapping and what is an acceptable performance?
- Technical requirements. How much area does our map need to support? How much data do we expect? How time sensitive? Robustness? Etc.
- Resources. How much resources can we allocate to developing? Headcount, timeline, project requirements.
This is a complex problem, and there are many more dimensions not captured here. In general, the use cases should be looked at first; before deep diving into any architectural designs.
General-purpose cluster-computing
There are many existing solutions, such as Apache Spark, Hadoop MapReduce, that solves a general compute and storage problem. Setting
There are also several specialized databases meant for geo storage.
Database
Building infrastructure to capture the huge amount of data from point cloud and camera images is an important task for localization and mapping. Point cloud data provides an accurate three-dimensional representation of an environment, which is useful for navigation and localization. Camera images provide additional context and detail, and can be used to supplement the point cloud data.
To process and store this data, a distributed and scalable compute / database is necessary. This database should be able to store and retrieve large amounts of data quickly and reliably. It should also be able to efficiently query and analyze data in order to locate objects within an environment and to build maps. To ensure accuracy and reliability, the database should also be optimized for robustness and scalability. With the right infrastructure in place, localization and mapping tasks can be done more efficiently and accurately.
Compute
Data Structures
The LAS / LAZ File Format
(LAZ compressed format) The most common open file format for the interchange of 3-dimensional point cloud data from aerial imaging.
Developed primarily for lidar point cloud data this format supports the exchange of any 3-dimensional x,y,z tuplet. The columns are quite fixed (well defined) so using it for terrestrial applications can be challenging.
US Naval Academy has detailed write up on this format. Some things to highlight about this format:
- An RGB value field, merged from a camera flown with the laser scanner
- The scan angle is captured, which indicates how far from nadir the scanner was pointed
- Field for overlap points, where two flight lines covered the same area.
- Allows distribution of the full waveform data, but that option is rarely used in practice. The resulting file sizes greatly increase the data storage, and few ultimate end users want or need that data.
PCD File Format
The format used in Point Cloud Library (or PCL) is a large scale 2D/3D image and point cloud processing, free for commercial and research use. Supports viewpoint information.
# .PCD v.7 - Point Cloud Data file format
VERSION .7
FIELDS x y z rgb
SIZE 4 4 4 4
TYPE F F F F
COUNT 1 1 1 1
WIDTH 213
HEIGHT 1
VIEWPOINT 0 0 0 1 0 0 0
POINTS 213
DATA ascii
0.93773 0.33763 0 4.2108e+06
0.90805 0.35641 0 4.2108e+06
0.81915 0.32 0 4.2108e+06
0.97192 0.278 0 4.2108e+06
0.944 0.29474 0 4.2108e+06
Difference between E57 format and LAS?
The LAS format was developed by the American Society for Photogrammetry and Remote Sensing (ASPRS) for the purpose of storing LIDAR point data. It is specifically geared toward the needs of the aerial sensing community, though the format can be utilized for terrestrial laser scanner data by ignoring the inapplicable fields.
The E57 format is intended to be a more general format that is well-suited for storing data across a variety of application domains. There are a number of differences in the capabilities of the E57 format as compared to the LAS format. The E57 format allows users to flexibly choose the information associated with each 3D point as well as the number of bits used to represent the information.
In contrast, LAS uses a pre-defined set of fixed-size record types that are specialized for aerial data collection.
The E57 format supports gridded data (i.e., data aligned in regular arrays), multiple coordinate systems (including Cartesian and spherical), embedded images from cameras, built in error detection, and groupings of points into rows, columns, or user-defined groups. The E57 format also defines an extension mechanism that allows users to develop custom capabilities that were not envisioned in the initial design of the standard. These extensions could be integrated into future versions of the standard. Finally, the E57 format has an essentially unlimited file size and number of records (1.8E19 bytes in length), whereas the LAS format is limited to 4.2E9 records.
PLY File Format
Easily visualize in Meshlab, can export to text or binary file. Can also save the camera pose. We export to PLY as an intermediate format so we can visualize the lidar data in Meshlab. We have a program that consumes dumped data dir and converts lidar into PLY format for visualization/debugging. It’s not used elsewhere
ply
format ascii 1.0 { ascii/binary, format version number }
comment made by Greg Turk { comments keyword specified, like all lines }
comment this file is a cube
element vertex 8 { define "vertex" element, 8 of them in file }
property float x { vertex contains float "x" coordinate }
property float y { y coordinate is also a vertex property }
property float z { z coordinate, too }
element face 6 { there are 6 "face" elements in the file }
property list uchar int vertex_index { "vertex_indices" is a list of ints }
end_header { delimits the end of the header }
0 0 0 { start of vertex list }
0 0 1
0 1 1
0 1 0
1 0 0
1 0 1
1 1 1
1 1 0
4 0 1 2 3 { start of face list }
4 7 6 5 4
4 0 4 5 1
4 1 5 6 2
4 2 6 7 3
4 3 7 4 0
Description website
ROS Message Format
In robotics, ROS format is common and point clouds messages are defined in sensor_msgs/PointCloud2.
Most likely they will convert it to .pcd because PCL is the de facto library for working with point clouds in ROS.
XYZ File Format
This is a popular chemical format for describing atomic coordinates, used in computational chemistry, but it can be used for point cloud.
WKT File Format
Well-known text (WKT) is a text markup language for representing vector geometry objects on a map. A binary equivalent, known as well-known binary(WKB), is used to transfer and store the same information on databases. The formats were originally defined by the Open Geospatial Consortium (OGC) and described in their Simple Feature Access.[1] The current standard definition is in the ISO/IEC 13249–3:2016 standard.[2]
WKT can represent the following distinct geometric objects:
- Point, MultiPoint
- LineString, MultiLineString
- Polygon, MultiPolygon, Triangle
- PolyhedralSurface
- TIN (Triangulated irregular network)
- GeometryCollection
EPT File Format
Entwine Point Tile (EPT) is an innovative storage format for point cloud data. Developed to be highly organized in an octo-tree structure, EPT is a simple and flexible tool that allows for efficient visualization of point cloud data. It is the perfect solution for anyone looking for an efficient way to store and view large-scale point cloud data.
Algorithms
The mapping/localization process is a critical part of many applications, such as autonomous robots and self-driving vehicles. In this section, we will discuss the various algorithms involved in this process and how they enable robots and vehicles to accurately map their surroundings and localize themselves.
Quadtrees
Octree
An octree is a tree data structure in which each internal node has exactly eight children. Octrees are most often used to partition a three-dimensional space by recursively subdividing it into eight octants. Octrees are the three-dimensional analog of quadtrees.
Octrees can be useful for many tasks in game design. For example:
- Deciding which game characters are close enough to each other for interaction
- Deciding which portions of a large game world need to be processed at a given time
Geohashes
(not to be confused with geohashing)
Space Filling curves
Data is stored with a key that represents the time/space location of the record, specifically the longitude, latitude, and time with a three-dimensional space filling curve is used. This red line is known as a space-filling curve, or to be more specific, a Z-Curve. This line visits each cell exactly once, establishing a unique ordering of cells. Space-filling curves can be adapted for n dimensions, letting GeoMesa linearize n dimensions of data in a single dimension.
The actual key structure is more complex than a simple key-value pair. Below is a more detailed representation of GeoMesa’s index in Accumulo.
A space filling curve maps higher n-dimensions into 1D line. Conversely, since the mapping is generalized and a one-to-one function, you can you fill infinite dimensions with a 1D line.
You may have seen space filling curves being used on the Map of the Internet, and blog post. There’s an excellent YouTube explanation of what a space filling curve is.
There’s a good blog post that summarizes the 3 types of algorithms (Damn Cool Algorithms: Spatial indexing with Quadtrees and Hilbert Curves).
The core of space filling curves is a function that would take an integer and return the pair of coordinates. This function is a fractal function, it cannot be extended to the reals, but rather to the binary fractions (a subset of the rationals). This lets you get arbitrarily close to any number you want (and cover all the IEEE floating points).
Various space filling curves techniques.
Visualization of an R*-tree for 3D points using ELKI.
Hilbert R-Tree is a variant on the R-Tree to achieve better performance.
Triangulated Irregular Network (TIN)
A common TIN algorithm is called Delaunay triangulation. It tries to create a surface formed by triangles of nearest neighbor points.
Delaunay triangulation with circumcircles around the red sample data. The resulting interpolated TIN surface created from elevation vector points is shown on the right.
- surfaces are not smooth and may give a jagged appearance due to discontinuous slopes at the triangle edges and sample data points
- triangulation is generally not suitable for extrapolation beyond the area with collected sample data points
CGAL The Computational Geometry Algorithms Library
CGAL is a software project that provides easy access to efficient and reliable geometric algorithms in the form of a C++ library. CGAL is used in various areas needing geometric computation, such as geographic information systems, computer aided design, molecular biology, medical imaging, computer graphics, and robotics.
G2O General Graph Optimization from OpenSLAM community
g2o is an open-source C++ framework for optimizing graph-based nonlinear error functions. g2o has been designed to be easily extensible to a wide range of problems and a new problem typically can be specified in a few lines of code. The current implementation provides solutions to several variants of SLAM and BA.
Potree 3D Point Cloud Visualization Web
Potree is a free open-source WebGL based point cloud renderer for large point clouds. PotreeConverter builds a potree octree from las, laz, binary ply, xyz or ptx files.
Colmap
COLMAP is image-based 3D reconstruction tool with a graphical and command-line interface. It offers a wide range of features for reconstruction of ordered and unordered image collections.
CMU uses this COLMAP as part of their reconstruction workflow from image capturing. Creating 3D Game Models from Video using Photogrammetry.
- QGIS has a great tool called DB Manager. It offers a similar GUI for your database, but in a much more compressed way and inside QGIS. You can modify and add tables, add indexes and do a lot of the basic operations in a right-clickable manner.
- GraphHopper Directions & Route Planning API
Pose Optimization
Simultaneous Localization and Mapping (SLAM) problems can be posed as a pose graph optimization problem. We have developed a nonlinear optimization algorithm that solves this problem quickly, even when the initial estimate (e.g., robot odometry) is very poor. MIT’s Graph Based SLAM
Localization can be viewed as a point set registration (or point matching).
There are different methods used to align the point sets in 2D or 3D space — which is to estimate a single transformation between the sets.
- Orthogonal Procrustes Problem given two matrices A and B and asked to find an orthogonal matrix R which most closely maps A to B.
- Kabsch Algorithm a method for calculating the optimal rotation matrix that minimizes the RMSD (root mean squared deviation) between two paired set of points.
- Iterative Closest Point to minimize the difference between two clouds of points.
- Coherent Point Drift (wiki) which run Coherent Point Drift on two datasets. We introduce a probabilistic method, called the Coherent Point Drift (CPD) algorithm, for both rigid and non-rigid point set registration
The Orthogonal Procrustes Problem and Kabsch Algorithm requires correspondence between point sets as an input whereas Iterative Closest Point treats correspondence as a variable to be estimated.
However, all of these differ from pose graph optimization, which is used to optimize a set of poses, given the relative pose between those poses. The difference is that point set registration problems only move the “model” point set such that the difference between the static “scene” set is minimized, without taking into consideration that the points in the model set may have been captured while moving. During a single lidar scan rotation, the lidar may be in motion (as it is mounted on a vehicle), meaning the points captured first and last will differ with respect to their point of origin. In other words, when working with point clouds from a rotating laser scanner, motion recovery and motion distortion correction of the lidar cloud must be done.
Databases
CMU Databases group maintains a Database of Databases and it’s mostly well populated. It’s missing the geospatial databases like GeoMesa/GeoWave, but you can find other info.
Central to a spatial data warehouse system is the effective handling of large-scale geospatial data. Spatial data warehouse stores large amount of information about the coordinates of individual spatial objects in space. Spatial Data Warehouse is so crucial to the enablement of an enterprise system that its effective usage will be the technical centerpiece of this work.
Types of Database Queries
First we must know what kind of queries and its respective volume in order to sensibly pick a database and schema.
Query Importance Ranking
In selecting a database we need to think and consider what type of queries that will happen on this data.
Optimize the data schema based on the following priority ranking.
- All the lidar data we have for a given bounding box (spacial and/or temporal).
- Only the routes that we’ve driven that crossed this intersection.
- Only the lidar, camera, and pose frames that contain a pedestrian or vehicle.
- All the routes we’ve driven that started and ended in San Francisco.
Column vs. Row Databases
- Column (columnar) databases enable easier parallelization of queries. By storing data in columns rather than rows, the database can more precisely access the data it needs to answer a query rather than scanning and discarding unwanted data in rows. Query performance is increased for certain workloads.
- Row databases fast transactional processing. Row-based systems are designed to efficiently return data for an entire row, or record, in as few operations as possible.
Row-based systems are not efficient at performing set-wide operations on the whole table, as opposed to a small number of specific records. For instance, in order to find all records in the example table with salaries between 40,000 and 50,000, the DBMS would have to fully scan through the entire table looking for matching records. While the example table shown above will likely fit in a single disk block, a table with even a few hundred rows would not, and multiple disk operations would be needed to retrieve the data and examine it.
LocationTech
LocationTech is a working group developing advanced location aware technologies with support from the non-profit Eclipse Foundation.
Several geo-projects are managed by LocationTech.
Geomesa and GeoWave two technologies based on Hadoop will be compared which are specialized in the efficient storage and retrieval of geotemporal data. Both technologies use Apache Accumulo as backend and GeoTools for handling geodata.
GeoMesa/GeoWave are quite similar in functionality, according to this review. We just decided to pick one and move to next problem.
GeoMesa
From the introduction website:
- Store gigabytes to petabytes of spatial data (tens of billions of points or more)
- Serve up tens of millions of points in seconds
- Ingest data faster than 10,000 records per second per node
- Scale horizontally easily (add more servers to add more capacity)
- Support Spark analytics
- Drive a map through GeoServer or other OGC (Open Geospatial Consortium) Clients
- GeoMesa implements GeoTools interfaces to provide programmatic access, and HTTP access to the Open Geospatial Consortium standards.
- Web Feature Service (WFS), allows querying and retrieval of features.
- Web Mapping Service (WMS), a HTTP interface for requesting geo-registered map images from geospatial databases.
- Web Processing Service (WPS) defines interprocess communications, IO, begin execution of a process, and discovery/binding to those processes.
- Web Coverage Service (WCS) multi-dimensional coverage data for access over the Internet
- Large data can be partitioned
Data Ingression
Multiple frameworks may be used for streaming and batch ingestion of data. These include the GeoMesa command line tools, map-reduce jobs with Apache Hadoop, and real-time topologies running on Apache Storm. The following diagram shows one possible ingest architecture:
The following shows one possible query architecture, in which the GeoTools and GeoMesa APIs mediate the use of Accumulo iterators for external query clients:
It should be possible to use GeoMesa for point cloud data. The traditional way to get data into GeoMesa is via the GeoTools datastore API; if you use that, each point would end up being a SimpleFeature. (Alternatively, you could store a bundle of clustered points together and do some fiddling in your application.)
As an alternative, one could use the newer GeoMesa native API to store collections of LIDAR points.
GeoWave
GeoWave was developed at the National Geospatial-Intelligence Agency (NGA), which is both a combat support agency under DoD and an intelligence agency.
GeoWave may be used by Facebook, because there is a RocksDB integration tutorial.
You can try it on Jupyter. Quick start guide available.
Read the User Guide for good overview.
Boundless Server Stack
Boundless Server is a complete web-based geospatial software stack. The applications contained are:
- PostGIS — Spatially enabled object-relational database.
- GeoServer — Software server for loading and sharing geospatial data
- GeoWebCache — Tile cache server that accelerates the serving of maps
- Composer — Web-based map configuration and styling utility
- WPS Builder — Web-based graphical constructor for generating server-side processes
- QuickView — Web application for composing and styling web maps
Other Databases
Worth a mention but not geospatial.
- Greenplum Massively Parallel Postgres for Analytics
You are always willing to learn.