Abstract rov3D project aims at developing innovative tools which link underwater photogrammetry and acoustic measurements from an active underwater sensor. The results will be 3D high resolution surveys of underwater sites. The new means and methods developed aim at reducing the investigation time in situ, and proposing comprehensive and non-intrusive measurement tools for the studied environment.
In this paper, we apply a pre-processing pipe line to increase the SIFT and SURF descriptors extraction quality in order to solve the problem of surveying an underwater archaeological wreck in a very high condition of turbidity. We work in the Rhodano river, in south of France on a roman wreck with 20 centimeters visibility. Under these conditions a standard process is not efficient and water turbidity is a real obstacle to feature extraction. Nevertheless the mission was not dedicated to an exhaustive survey of the wreck, but only a test to show and evaluate the feasibility.The results are positive even if the main problem seems now to be the time processing, indeed the poor visibility increase drastically the number of photographs.
Keywords—SIFT, Correlation, Underwater photogrammetry, Underwater archaeology, Water turbidity.
ROV3D project goal is to develop automated proceedings of 3D surveys, dedicated to underwater environment, using both acoustic and optic sensors. The acoustic sensor allows acquiring a great amount of low resolution data, whereas the optic sensor (close range photogrammetry) allows acquiring a low amount of high resolution data. In practice, a 3D acoustic scanner produces a range wide scan of the scene, and an optic system allows a high resolution restitution (larger scale) of different areas in the scene.
The Rhône Archaeological Map excavations carried out by the DRASSM at Arles (Bouches-du-Rhône Department, southern France) over the course of the past 20 years have been seriously hampered by the river’s current and poor visibility. In fact, the river water is opaque below six meters, and the visibility, even with good illumination, hardly ever surpasses 20 centimeters. In addition, pollution, the constant passage of canal boats and attacks by large catfish further hinder divers’ concentration and the quality of the work.
Despite extremely poor conditions, the study of these river wrecks has allowed us to record very important elements of archaeological heritage and offers an exceptional opportunity to develop new recording methods. This is particularly true of our experiments with photogrammetric and acoustic techniques, always performed in very difficult set-up and recording conditions. In 2011, however, combining these techniques proved to be an interesting alternative for rapid evaluation of a new Roman wreck discovered in the river. This collaboration between archaeologists and photogrammetrists in the Rhône’s turbid waters will eventually lead to the development of information-gathering procedures that will speed up measurement, study and interpretation of sites in situ. Even if poor visibility prevents global perception of a site, the acoustic record provides the missing information. Moreover, generating a 3D representation from a high-resolution digital model of complex sites provides archaeologists an overall notion of their configuration. This had never been possible before in the Rhône. Thus, initial trials on the Arles-Rhône 13 wreck serve as a test of these techniques. The encouraging results foreshadow regular collaboration in the Rhône and represent a turning point for fluvial archaeology.
The Arles-Rhône 13 wreck, discovered on the right bank at PK 282.700
in five to six meters of water (between the Arles-Rhône 14 and 17
sites) is of major interest. It represents the first time a seagoing
vessel has been found in the river at Arles. All twelve ancient
wrecks previously recorded near Arles are flat-bottomed river or
fluvio-maritime vessels . However, the hull shape and assembly
method of the Arles-Rhône 13 wreck are typical of maritime, rather
than fluvial construction.
This vessel likely sank during the third or fourth century CE. River action exposed about 10 m2 of the poorly-preserved hull, which is of typical mortise-and-tenon construction. The hull remains lie bottom-up; the keel has been swept away by the current. The typically elegant concave garboard shape of a seagoing vessel is clearly visible on the exposed portion of the hull. Segments of two overturned floors pierced by limber holes are visible within the hull. They are still fastened to the garboard and adjacent strakes. The goals of our short mission were to map the exposed parts of the vessel, record two transverse sections, take augur sediment samples and wood samples for radiocarbon analysis. However, recording by photogrammetry and acoustic methods represents an important and innovative step. It remains difficult to estimate the vessel’s dimensions. Scantlings of the three meters of exposed hull suggest an overall length not less than 18 or 20 meters. The calibrated 14C date falls between 231 cal CE and 381 cal CE. Thus, this vessel sank during the Late Empire, perhaps at a time when the city of Arles, was recovering from third-century invasion damage and reestablishing significant economic activity linked to international trade with the Mediterranean under Emperor Constantine .
In 2009, mixed media artist Daniel Zanca had immersed a ravishing ceramic Venus head framed in industrial steel not far from the Arles-Rhône 13 wreck. It was rediscovered, still in place, at the same time as the Roman wreck. Zanca’s creation was recorded using the same combination of photogrammetry and acoustic sonar techniques used for the wreck. This yielded a stunning image of the piece lying just 20 meters from the vessel, but separated from it by nearly 2000 years.
One of the most important issues in this work is to obtain an image quality for analysis, measurement, and extracting points of interest to optimize image processing and their orientation for the photogrammetric use.
The underwater image pre-processing can be addressed from two different points of view: image restoration techniques or image enhancement methods.
Image restoration techniques need some parameters such as attenuation coefficients, scattering coefficients and depth estimation of the object in a scene. For this reason in our works, the preprocessing of underwater image is devoted to image enhancement methods, which do not require a priori knowledge of the environment.
Bazeille et alii  proposed an algorithm to enhance underwater images, this algorithm is automatic and requires no parameter adjustment to correct defects such as non-uniform illumination, low contrast and muted colors.
In this algorithm which is based on the enhancement, each disturbance is corrected sequentially. The first step is to remove the moiré effect is not applied, because in our conditions this effect is not visible. Then, a homomorphic filter or frequency is applied to remove the defects of non-uniformity of illumination and to enhance the contrast in the image.
Regarding the acquisition noise, often present in images, they applied a wavelet denoising followed by anisotropic filtering to eliminate unwanted oscillations. To finalize the processing chain, a dynamic expansion is applied to increase contrast, and equalizing the average colors in the image is being implemented to mitigate the dominant color. Figure 1 shows the result of applying the algorithm Bazeille et alii.
To optimize the computation time, all treatments are applied on the component Y in YCbCr space. However the use of homomorphic filter changes the geometry, which will add errors on measures after the 3D reconstruction of the scene, so we decided not to use this algorithm.
First of all, their method performs contrast stretching on RGB and then it converts the result from RGB to HSI color space. Finally, it deals with saturation and intensity stretching. The use of two stretching models helps to equalize the color contrast in the image and also addresses the problem of lighting.
Chambah et alii proposed a method of color correction based on the ACE model . ACE “Automatic Color Equalization” is based on a new calculation approach, which combines the Gray World algorithm with the Patch White algorithm, taking into account the spatial distribution of information color. The ACE is inspired by human visual system, where is able to adapt to highly variable lighting conditions, and extract visual information from the environment .
This algorithm consists of two parts. The first one consists in adjusting the chromatic data where the pixels are processed with respect to the content of the image. The second part deals with the restoration and enhancement of colors in the output image . The aim of improving the color is not only for better quality images, but also to see the effects of these methods on the SIFT or SURF in terms of their feature points detection. Three examples of images before and after restoration with ACE are shown in Figure 3.
Kalia et alii  investigated the effects of different image pre-processing techniques which can affect or improve the performance of the SURF detector . And they proposed new method named IACE ‘Image Adaptive Contrast Enhancement’. They modify this technique of contrast enhancement by adapting it according to the statistics of the image intensity levels.
If Pin is the intensity level of an image, it is possible to calculate the modified intensity level Pout with equation (1).
where a is the lowest intensity level in the image and equal to 0, b is its corresponding counterpart and equal to 255 and c is the lower threshold intensity level in the original image for which the number of pixels in the image is lower than 4% and d is the upper threshold intensity level for which the number of pixels is cumulatively more than 96%. These thresholds are used to eliminate the effect of outliers, and improve the intrinsic details in the image while keeping the relative contrast. However, Pout values must be in the interval [0, 255], therefore we used the following algorithm: if Pout < 0 Pout = 0; else if Pout >255 Pout = 255; end if
The results of this algorithm are very interesting. One can observe that the relative performance of IACE method is better than the method proposed by Iqbal et alii in terms of time taken for the complete detection and matching process.
The purpose of preprocessing is improving the quality of images to enhance the detection of interest points. Thereafter, these points of interest will be matched and used for 3D reconstruction of the scene.
In our work, we decided to use two methods most robust in terms of invariance to the transformation and distortion of images: Scale Invariant Feature Transform "SIFT" and Speeded-Up Robust Features "SURF".
Scale-invariant feature transform "SIFT" is a detector and descriptor at the same time proposed by Lowe . it is a method of extracting points of interest that are invariant to changes during image acquisition, these points of interest are local maxima or minima of the difference of Gaussians. Each point has detected a descriptor vector which is the norm and direction of the gradient in the region around the point of interest.
Speeded-Up Robust Features "SURF" proposed by  is a descriptor invariant to change of scale, rotation and image, this method is divided into two parts, the first part is devoted to the detection of points of interest, where in each scaling the local maxima are calculated using the Hessian matrix. From these local maxima, we choose the candidate points that are above a given threshold which will subsequently be invariant to scaling.
The purpose of the second part of this algorithm is to find a descriptor that will make the points detected invariant to rotation, the SURF descriptor is much faster but less robust than SIFT and can therefore be used in applications for real time processing.
After using SIFT and SURF to extract features from images, we implemented three methods for measuring distances, SAD Sum of Absolute Distances, SSD (Sum of Squared Differences) and its normalized version NSSD (Normalise Sum of Squared Differences). Frequently these methods are used to compute the level of dissimilarity between two pixels. In our work we use these methods to compute the distances between each feature (point) obtained with SIFT or SURF descriptor in the first photograph with all points in the second where the best result corresponds to the minimum value obtained after the computation.
We also added the method proposed by Lowe, this method is based on the K-Nearest Neighbour algorithm (KNN) with a modified search using the kd-tree to optimize the calculation time and find corresponding points using Euclidean distance.
In our experiments, we took a set of 14 images taken by photographer Olivier BIANCHIMANI of VENUS (one of the artworks of international artist Daniel ZANCA), where we reduced the resolution of these photographs to 639 x 425 pixels to reduce computation time (see Figure 4). The choice of this object was to work on an underwater scene where water is very turbid and to test the robustness of SIFT and SURF to extract features.
The implementation was run on an Intel Core i7 CPU 980 at 3.33 GHZ with 12GB of RAM under Windows 7 operating system. We studied the effects of different methods that can affect or improve the performance of repeatability of a descriptor. Initially, we noticed improvements in color quality and we also see that the algorithm proposed by Iqbal et alii gives the best visual results.
Our approach is to detect points of interest on all images using SIFT or SURF descriptors. Subsequently, images are matched two by two with one of methods of distance measurement mentioned previously. For each matched pair of images, the relative orientation is computed using the 5 points algorithm proposed by .
From these orientations, an approximate value of orientations and coordinates of object points are calculated. Then a bundle adjustment is applied for optimal estimation of orientation parameters and 3D.
We cannot give the results for all tests because of the space limitations. In the Table 1, we present some results obtained after several tests. This table summarizes the tests performed with SURF and SIFTS descriptors on the original images and preprocessed images, the purpose of these tests as a first step is to find the best preprocessing algorithm in terms of color correction and preprocessing time and which mainly increases the repeatability of descriptors. In a second step, we seek to find the most appropriate method for calculating distances with the type of images that we used in our work which will give more points matched and remove outliers.
We judge the quality of these descriptors according to the number of image pairs oriented, the number of corresponding points and the reprojection error calculated both with the Root Mean Square (RMS) and the average error methods.
The results obtained with images from the three preprocessing methods are better in comparison to results obtained with original images. However, the ACE took an hour and 35 seconds, for the same image IACE took 0.13 seconds and the method proposed by Iqbal et alii took 0.15 seconds almost the same time as the IACE method.
Before choosing the best method of preprocessing to be used in our future work, we started first by the choice of method of measuring distances where it was found that the method used by Lowe which is based on the algorithm KNN performed best in terms of points matches and computation time, otherwise the SSD method and its normalized version NSSD also produce good results in terms of matched points and the number of pairs oriented but requires more time for the computation.
Finally after several tests, we found that the IACE and the method proposed by Iqbal et alii are quite efficient in terms of preprocessing time and number of matched points. However we cannot make a choice between these methods because the results depend on image quality and nature of objects which are located in the scene. In Table 1 we presented the results, where the Iqbal et alii method with SIFT and SUFT descriptors gave the best result. However, IACE method is also effective with other photos of underwater scene.
The photogrammetric approach used here provides results such as 3D points seen on at least three photographs. After the orientation step, we produce dense cloud of 3D-points using patch method proposed by Furukawa and Ponce .
The point clouds generated are scaled and geo-referenced on a system of reference and, if necessary, aligned with each other to represent a total object.
It is easy to calculate the scale using a synchronized multi-cameras system (Stereo or Tri-focal system) by knowing the baseline between cameras. However, in high turbidity water, we have to be too close to the measured object and to use of synchronized cameras is not possible. In this mission we used a Nikon D700 camera with spherical housing and 14mm focal length lens. Therefore, to calculate the scale with a single camera, we need scale bars or known object on site.
As we use photographs for their determination and that the photographs are oriented in space we also obtain color information for each point. These point clouds contain no semantics and their density can only help the user to recognize the measured object and eventually, to measure the visible parts (See Figure 5, Figure 7, Figure 8).
We are currently working on merging archaeological knowledge (see Figure 6) to these measured points by automatic clustering and learning methods in framework of ROV3D project.
In this paper, we studied three preprocessing methods whose purpose was to improve color and contrast of underwater images and increase repeatability of descriptors compared to original images. We have also presented some methods for measuring distances where we found that the IACE method and the method proposed by Iqbal et alii give almost the same results in terms of computation time and repeatability of SIFT and SURF descriptors.
The use of one of these methods as an initial method of preprocessing with the KNN method for distance measurements gives good results in terms of computation time and the reprojection error compared to results obtained with images without preprocessing. Nevertheless, the ACE method is very slow in terms of preprocessing time, however, we observed an improvement of color contrast and a brightness correction. For this reason, we plan to use the images obtained as texture after the full 3D reconstruction of the underwater scene.
Finally we have applied our developments on a real case during the excavation of a roman wreck in Rhodano river, south of France. This approach here is of big interest because of the really poor visibility (less than 20 cm). In effect in such conditions a standard process is not efficient and can't give any results.
Our approach allows us to process a large set of photograph with some very encouraging results. The main problem, in order to satisfy archaeologist needs remains to obtain a complete survey of the wreck which should imply to process around 10000 photographs. Currently we don’t know if such a quantity of images (taken with 20 cm visibility) can be oriented together.