Gal Shitrit, M.Sc Thesis

Automatic Calculation of Physical Dimensions of Objects
from a RGBD Image Using Deep Learning Methods

In classical computer vision and photogrammetry, finding the physical dimensions of a certain object is performed using a calibrated stereo pair that outputs depth image or point clouds. This pair enables the full recovery of physical dimensions of features and objects that are fully visible on both cameras. Recently, the availability of cheap depth sensors, such as Microsoft’s Kinect, has increased greatly.

In Recent years, there has been a major improvement in the field of computer vision by the introduction of the deep learning methods. These methods allow the computer to learn complex tasks by processing a large number of examples. Using deep learning, it is maybe possible to learn how to extract physical dimensions from complex parts in RGBD images.

The suggested system architecture will be as follows: first, a RGBD image is fed into the system. second, the objects in the image are detected using CNN. For these objects, another CNN will identify and detect several known geometric features, for example: a distance between two faces of a box. These features will be dimensioned by the CNN.

Since most of the mechanical parts are based on the combination of boxes and cylinders, we can use combinations of these shapes along with their dimensioning schemes to fully dimension a part.  A synthetic RGBD images of these features will be created in order to train a CNN for the dimensioning task. The research will focus on this task.

diagramSolution approach diagram