3D object recognition from 3D scenes, is one of the challenges of several researchers in the field of computer vision, engineering and Robotics. The occlusion is one of the problems that we can found. One of the possible solutions in this situation is to find a part of an object in the scene that can be identified. For this reason, we are mainly interested to partial shape retrieval methods. In this paper, we present a new approach for 3D partial object retrieval based on level curves matching. Our approach can be used as an alternative solution for classification-based methods. We generate in the off-line step a dataset by using a viewing sphere to extract levels curves at different points of view. The level curves are a set of 2D planar contours that are the projection of points on several perpendicular planes. The level curves of each query partial object is compared with a set of level curves that define one 3D object from the dataset. The number of matched curves between a partial object and complete object represent the weight of that class. The class with the heavy weight is identify as the class of the query object.
3D shape retrieval is still a research field in the exploration phase, knowing that until now it is difficult to find an automatic method which is efficient enough to identify a part of a 3D model within an occlusion. We present a novel approach that deals with partial objects. The purpose is to find a partial object in a dataset that contains full 3D objects (point clouds). Our algorithm consists of creating the dataset phase, finding the correspondence between two curves. So, the first contribution of this work is to use a viewing sphere to store the contours of the object from several viewpoints, in order to create a database with complete information about the 3D objects. The second contribution is to extract level curves that represent the contours of the object at various levels, in order to reduce the problem of matching between two 3D point clouds to 2D planar curves alignment which is widely discussed in the literature.
As noted above, our approach is based on curve matching,
the fact that we believe that the best descriptions of 3Dobjects
are their forms, led us to think of the level curves.
Those curves can be extract by slicing out point clouds
(3D models) using a several planes with a regular step.
Two ways of slicing are allowed : using one point of view i.e. one 'cutting' plane
shilted along the model typically as level curve with horizontal plane in cartography,
or choosing the cutting direction from several point of view settable on a spherus defined around the studied object.
We are currently develpping this approach even if it also means more curves for matching and large computation time.
The curve obtained at one level is computed by projecting the points on the plane (level) if the distances between this latter and the points are less than a threshold. At that time, the projected points present a set of unordered points. The steps to get an organize set of 2D points that present one curve are listed below.
1) Compute the 2D coordinates of the points relative to the plane.
2) Select one point randomly and find its nearest neighbor to get the first segment which will be added to the new list of points.
3) For the remaining points, compute the distance between the first point and the last on the list, if the first distance is less than the second, the new point is added to the top of the list, else it will be added to the end. Recall that in our approach, we used the curvature and the arc-length of the curve, we will define later how to better parameterized the curve for this use.
The division of 3D-object (point clouds) by planes perpendicular
to the direction of view in order to obtain level
curves, produced several planar curves. In our approach, we
reduce the problem of matching between two point clouds
by using descriptors or signature, to a problem of matching
among curves. Finding the best fit between two curves is
the center of interest of several filed as computer graphic,
computer vision and so on. The set of points extracted from
each level, present a 2D planar curve which is parameterized
in Cartesian coordinates as:
where u is from 1 to the number of points at one level, x and y are Cartesian coordinates of points. The purpose of curve matching is to find the longest common subcurve of two curves, and compute the rotation angle and translation vector to fit these curves along their common portion. We are inspired by the work presented by Wolfson: On curve matching. Pattern Analysis and Machine Intelligence, 1990. The authors used the curvature w.r.t the arclength as a signature for a curve. Cui et al. improved this signature by using the integral of unsigned curvatures. We just used the original method proposed by Wolfson for ease of implementation. To compute the curvature of the curve, the first and second derivatives of x and y both must exist and continuous. Cubic spline approximation are used to present curves as a polynomial of order 4 as showed in equation below. Then, these curves are parameterized by arc-length as presented in Wang et al: Arc-length parameterized spline curves for real-time simulation, 2002.
Matching process aims to extract the best part from one curve that can be matched with whole or just part from the second curve, to solve this problem, we need to find the position where the query curve aligns the best curve from the dataset. To compute measure the similarity between the query curve and each curve from the dataset, we slide the signature of the query curve on the current curve of the dataset. The small euclidean distance refers to the position of the best fitting. To remove the false matches, we added a new step to compute the euclidean transformation between points on the matched parts. The alignment error is computed by using RMS (root means square) error, which represent the sum of distance among points of matched parts. This error and the similarity measure can be used as indications on the quality of the curve matching. The figure on top show an example of curve extraction from the complete 3D model of amphora Ramon 2111-73. This curve are matched with the illustrated curves here below to present the problem of the direction of parameterization (blue arrows) as it is highlighted in Cui, et al: Curve matching for open 2d curves, 2009. Therefore, to solve this problem, each curve from the dataset is matched by using the first and second direction of the query curve parameterization, then the best match is keeping. We first performed 3D partial object retrieval experiments using a publicly available database, now we are now working on a real case of Xlendi wreck. In figure the first step of this work in progress.
We are currently working on using our approach for registration between two 3D models, we focus to test the signature proposed by Cui, et al: Curve matching for open 2d curves, 2009. The authors demonstrated that is invariant to the scale change and euclidean transformation (rotation and translation). The goal on Xlendi wreck is to provide a tool able to propose the object detection, typology determination and final theoretical object matchin on the observed ground. The final validation will be done by an ontological approach validating the results.
The goal of this project is the extraction of known artefacts present on the site, our target is to make our automatic matching algorithm reaches the accuracy of the manual matching (figure 1) which is indeed an effort and time consuming task. The proposed approach, based on curve matching extract by slicing, is able to correctly detect the position of amphorae, but rotation alignment is not accurate enough in some cases due to the small overlap.
We propose to use a deep learning approach that is proving its worth in many research fields and shows the best performance on different competitions in order to train the shape of various and different amphorae and the context of the ground. Then we propose to use a transfer learning process to fine-tune our model over the Xlendi shipwreck amphorae. This approach allows us to train the model using a small part of the Xlendi database. Underwater objects are rarely in perfect state. Indeed, they can be covered by the sand or by another object and they can be broken. When an amphora has a neck, it is commonly that this part is separated to the amphora’s body. We want to detect all the amphora pieces by performing a pixel segmentation which consists of adopting a pixel-wise classification approach on the orthophoto. To improve the model, we define three classes: the underground, the body of the amphora and the head of the amphora; which are the rim, the neck and the handles respectively. After the pixel segmentation, we group pixels with similar probabilities together to get an object segmentation.
The CNN is composed of a series of layers in which each layer takes as input the output of the previous layer. The first layer is named the input layer and takes as input the testing or the training image. The last layer is the output of the network and gives a prediction map. The output of a layer, noted l in the network, is called a feature map and is noted 𝑓𝑙. In this work, we use 4 different types of layers: convolution layers, pooling layers, normalization layers and deconvolution layers. In this case our CNN architecture is composed of 7 convolution layers, 3 pooling layers and 3 deconvolution layers.
We train our CNN on images coming from another site and then we use a small part of the Xlendi image to fine-tune the weights of the CNN. On the Xlendi Image we have only used 20 amphorae as training examples. As you can see in the picture below all the amphorae in the testing image are detected. The false positives are mainly located on the grind stones. This error is due to the small size of the training database. Indeed, during the pre-training step there are not grinding stone examples in the used images, then during the tuning step only few grind stone examples are represented. On the segmentation pixel image the recall is around 57% and the precision around 71%. The recall is low because the edges of the amphorae are rarely detected since the probability is the highest at the middle of each amphora and then it decreases rapidly toward the edges. For the object detection map, the noise is removed and so the recall is close to 100% and the precision is around 80%.