Object recognition.

Curve matching, NPR, Deep learning.

Object recognition in a 3D complex model

3D object recognition from 3D scenes, is one of the challenges of several researchers in the field of computer vision, engineering and Robotics. The occlusion is one of the problems that we can found. One of the possible solutions in this situation is to find a part of an object in the scene that can be identified. For this reason, we are mainly interested to partial shape retrieval methods. In this paper, we present a new approach for 3D partial object retrieval based on level curves matching. Our approach can be used as an alternative solution for classification-based methods. We generate in the off-line step a dataset by using a viewing sphere to extract levels curves at different points of view. The level curves are a set of 2D planar contours that are the projection of points on several perpendicular planes. The level curves of each query partial object is compared with a set of level curves that define one 3D object from the dataset. The number of matched curves between a partial object and complete object represent the weight of that class. The class with the heavy weight is identify as the class of the query object.

3D model of the Xlendi wreck computed by photogrammetry before section.

3D shape retrieval is still a research field in the exploration phase, knowing that until now it is difficult to find an automatic method which is efficient enough to identify a part of a 3D model within an occlusion. We present a novel approach that deals with partial objects. The purpose is to find a partial object in a dataset that contains full 3D objects (point clouds). Our algorithm consists of creating the dataset phase, finding the correspondence between two curves. So, the first contribution of this work is to use a viewing sphere to store the contours of the object from several viewpoints, in order to create a database with complete information about the 3D objects. The second contribution is to extract level curves that represent the contours of the object at various levels, in order to reduce the problem of matching between two 3D point clouds to 2D planar curves alignment which is widely discussed in the literature.

Curve extraction

As noted above, our approach is based on curve matching, the fact that we believe that the best descriptions of 3Dobjects are their forms, led us to think of the level curves. Those curves can be extract by slicing out point clouds (3D models) using a several planes with a regular step. Two ways of slicing are allowed : using one point of view i.e. one 'cutting' plane shilted along the model typically as level curve with horizontal plane in cartography, or choosing the cutting direction from several point of view settable on a spherus defined around the studied object. We are currently develpping this approach even if it also means more curves for matching and large computation time.

On the left the amphora model computed from the archaeological design, on the right, level curves extraction from one several planes.

The curve obtained at one level is computed by projecting the points on the plane (level) if the distances between this latter and the points are less than a threshold. At that time, the projected points present a set of unordered points. The steps to get an organize set of 2D points that present one curve are listed below.

1) Compute the 2D coordinates of the points relative to the plane.

2) Select one point randomly and find its nearest neighbor to get the first segment which will be added to the new list of points.

3) For the remaining points, compute the distance between the first point and the last on the list, if the first distance is less than the second, the new point is added to the top of the list, else it will be added to the end. Recall that in our approach, we used the curvature and the arc-length of the curve, we will define later how to better parameterized the curve for this use.

3D model of the Xlendi wreck after curves extraction.

Curve matching

The division of 3D-object (point clouds) by planes perpendicular to the direction of view in order to obtain level curves, produced several planar curves. In our approach, we reduce the problem of matching between two point clouds by using descriptors or signature, to a problem of matching among curves. Finding the best fit between two curves is the center of interest of several filed as computer graphic, computer vision and so on. The set of points extracted from each level, present a 2D planar curve which is parameterized in Cartesian coordinates as:

where u is from 1 to the number of points at one level, x and y are Cartesian coordinates of points. The purpose of curve matching is to find the longest common subcurve of two curves, and compute the rotation angle and translation vector to fit these curves along their common portion. We are inspired by the work presented by Wolfson: On curve matching. Pattern Analysis and Machine Intelligence, 1990. The authors used the curvature w.r.t the arclength as a signature for a curve. Cui et al. improved this signature by using the integral of unsigned curvatures. We just used the original method proposed by Wolfson for ease of implementation. To compute the curvature of the curve, the first and second derivatives of x and y both must exist and continuous. Cubic spline approximation are used to present curves as a polynomial of order 4 as showed in equation below. Then, these curves are parameterized by arc-length as presented in Wang et al: Arc-length parameterized spline curves for real-time simulation, 2002.

Matching process aims to extract the best part from one curve that can be matched with whole or just part from the second curve, to solve this problem, we need to find the position where the query curve aligns the best curve from the dataset. To compute measure the similarity between the query curve and each curve from the dataset, we slide the signature of the query curve on the current curve of the dataset. The small euclidean distance refers to the position of the best fitting. To remove the false matches, we added a new step to compute the euclidean transformation between points on the matched parts. The alignment error is computed by using RMS (root means square) error, which represent the sum of distance among points of matched parts. This error and the similarity measure can be used as indications on the quality of the curve matching. The figure on top show an example of curve extraction from the complete 3D model of amphora Ramon 2111-73. This curve are matched with the illustrated curves here below to present the problem of the direction of parameterization (blue arrows) as it is highlighted in Cui, et al: Curve matching for open 2d curves, 2009. Therefore, to solve this problem, each curve from the dataset is matched by using the first and second direction of the query curve parameterization, then the best match is keeping. We first performed 3D partial object retrieval experiments using a publicly available database, now we are now working on a real case of Xlendi wreck. In figure the first step of this work in progress.

On the left the level curves extraction from several planes, on the right a selection of curves with partial trace of the amphorae and other artefacts.

3D point matching

We are currently working on using our approach for registration between two 3D models, we focus to test the signature proposed by Cui, et al: Curve matching for open 2d curves, 2009. The authors demonstrated that is invariant to the scale change and euclidean transformation (rotation and translation). The goal on Xlendi wreck is to provide a tool able to propose the object detection, typology determination and final theoretical object matchin on the observed ground. The final validation will be done by an ontological approach validating the results.

The NPR method

The goal of this project is the extraction of known artefacts present on the site, our target is to make our automatic matching algorithm reaches the accuracy of the manual matching (figure 1) which is indeed an effort and time consuming task. The proposed approach, based on curve matching extract by slicing, is able to correctly detect the position of amphorae, but rotation alignment is not accurate enough in some cases due to the small overlap.

Xlendi Merge

First matching using a manual recognition of the typology.

A possible solution is to extend the matching by considering other aspects such as colour, texture information or Non Photorealistic Rendering (NPR). As you can see in the figure, when do you have a 3D model made by photogrammetry, usually the object are included into a continuous surface made by triangles. Boundary, geometry, and features are more or less non detectable. To recognize object into this kind of surface you have to use pattern, texture, color information and light effects or you can use a different kind of rendering the NPR. NPR rendering is an automatic rendering of geometric models using a small number of stylized feature lines. This method of rendering directly accesses the geometry of the 3D objects, and select a controlled number of feature edge like: silhouette, boundary, crease, cap, and pit edges.

Visualizing mesh model: faceted mesh, with texture and with light and shadow.

Virtual representation is an open field of computer graphics that since its emergence has concentrated on making images photorealistic and indistinguishable from reality. In the field of cultural heritage this kind of representation can be useful for many applications but it is not the only one.
One of the primary objectives of the GROPLAN project is to provide archaeologists with a set of measurement tools that do not require the presence of a specialist to be used. The goal is to obtain a 3D model of a site that has already integrated some of archaeological knowledge. In many cases, archaeologists and historians are accustomed to using drawings, sketches and more expressive forms of illustration. This is both due to tradition, and also due to the expressive power of traditional illustration methods. NPR representation can be solve this kind of objective because with an high accuracy, directly from 3D model, you can produce very quickly sketches and drawings. On the other hand if you have a few "good" lines that can represent faithfully the three-dimensional model, all the recognition operations can be performed on less data, This allows you to speed up the matching. In the case of Xlendi's amphorae it is easier to find amphoras on the sandy seabed using a NPR image. With NPR boundaries are much more pronounced and, for example, to isolate objects from the background you can search the image for closed loops. Therefore the NPR representation can be used at the beginning to identify an area where probably an amphora is located, or during the process to improve object recognition and matching as an alternative to the current proposed matching scheme.

Left:3D models using Photographic Rendering and NPR. Right: The area of archaeological excavation, comparing representation methods.

Deep Learning

We propose to use a deep learning approach that is proving its worth in many research fields and shows the best performance on different competitions in order to train the shape of various and different amphorae and the context of the ground. Then we propose to use a transfer learning process to fine-tune our model over the Xlendi shipwreck amphorae. This approach allows us to train the model using a small part of the Xlendi database. Underwater objects are rarely in perfect state. Indeed, they can be covered by the sand or by another object and they can be broken. When an amphora has a neck, it is commonly that this part is separated to the amphora’s body. We want to detect all the amphora pieces by performing a pixel segmentation which consists of adopting a pixel-wise classification approach on the orthophoto. To improve the model, we define three classes: the underground, the body of the amphora and the head of the amphora; which are the rim, the neck and the handles respectively. After the pixel segmentation, we group pixels with similar probabilities together to get an object segmentation.
The CNN is composed of a series of layers in which each layer takes as input the output of the previous layer. The first layer is named the input layer and takes as input the testing or the training image. The last layer is the output of the network and gives a prediction map. The output of a layer, noted l in the network, is called a feature map and is noted 𝑓𝑙. In this work, we use 4 different types of layers: convolution layers, pooling layers, normalization layers and deconvolution layers. In this case our CNN architecture is composed of 7 convolution layers, 3 pooling layers and 3 deconvolution layers.

Representation of the architecture that we are proposing using an example to activate the feature maps.

We train our CNN on images coming from another site and then we use a small part of the Xlendi image to fine-tune the weights of the CNN. On the Xlendi Image we have only used 20 amphorae as training examples. As you can see in the picture below all the amphorae in the testing image are detected. The false positives are mainly located on the grind stones. This error is due to the small size of the training database. Indeed, during the pre-training step there are not grinding stone examples in the used images, then during the tuning step only few grind stone examples are represented. On the segmentation pixel image the recall is around 57% and the precision around 71%. The recall is low because the edges of the amphorae are rarely detected since the probability is the highest at the middle of each amphora and then it decreases rapidly toward the edges. For the object detection map, the noise is removed and so the recall is close to 100% and the precision is around 80%.

Pixel segmentation on the testing part of the Xlendi orthophoto. On the probability map, the wither the pixel, the higher the probability to be an amphora is. On the object detection map, the green circles represent the correctly detected amphorae and the red circles the false positive detections.