This tutorial reports the software resurrection exercise on the software code released as a part of the following research paper in Computer Graphics.
Nehab et al. "Efficiently combining positions and normals for precise 3D geometry." ACM Transactions on Graphics (2005) [doi].
Nehab et al. (2005) describe a method to combine information from two different sources in order to recover a more detailed geometry (i.e. 3D shape) of an object. A stereo triangulation method can capture 3D geometry of an object and represent it using triangular meshes. A photometric stereo method can capture the surface normal at each point represented by a pixel in an image. These two methods capture different types of detail about the 3D geometry of an object. The stereo triangulation method captures overall shape of an object but is unable to accurately capture fine details (i.e. high frequency details). The photometric stereo method can capture fine details about the geometry but is unable to capture depth information accurately. The insight of Nehab et al. (2005) was to combine these two sources of information in order to recover a more detailed and accurate 3D geometry of an object that contains finer details. The research paper provides a more accurate and detailed description of the method.
I read this paper during my MSc and wondered how do people come up with such wonderful ideas. What amazed me was that Nehab et al. (2005) went a step further by releasing the program code, documentation and sample data required to reproduce the results reported in their research paper. As a research student, these assets allowed me to explore the proposed method in more detail and develop an understanding of their method in a way that I can never forget. This was in the year 2009-2010 at the University of York (UK). In 2022, I was invited to give a talk at the Vision, Graphics and Learning (VGL) group at the Department of Computer Science in the University of York. To demonstrate the concept of software resurrection, I chose the program code released with Nehab et al. (2005) because the audience will be able to connect with this software.
The software resurrection exercise for Nehab et al. (2005) is described below. We use the following hardware and software environment for software resurrection exercise of sqlite-2002.
| Model | Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz |
| Architecture | x86_64 |
| Byte Order | Little Endian |
| Address sizes | 39 bits physical, 48 bits virtual |
| CPU(s) | 16 (Family: 6, Model: 158, cpu MHz: 800.072) |
| Threads per code | 2 |
| Core per socket | 8 |
| Socket | 1 |
| Memory | Cache: 16 MB, RAM: 32 GB |
| Operating System | Debian GNU/Linux 11.3 (bullseye) released on March 26th 2022 |
| Compiler | gcc-10.2.1 (Debian 10.2.1-6) 20210110 |
The source
code is downloaded and extracted. A readme.txt file is
missing but the cc file extension suggests that
this is a C++ source code. The mesh_opt.cc source
file is compiled using the g++ compiler which
reveals the first compilation error.
## Prepare environment export ROOT=$HOME/sr/nehab2005combining mkdir -p $ROOT && cd $ROOT ## Download source wget https://w3.impa.br/~diego/software/NehEtAl05/mesh_opt.tar.gz tar -zxvf mesh_opt.tar.gz g++ -o mesh_opt mesh_opt.cc mesh_opt.cc:15:10: fatal error: cholmod.h: No such file or directory 15 | #include "cholmod.h" | ^~~~~~~~~~~ compilation terminated.
The Debian package manager informs that
the cholmod library is already installed. This
implies that the g++ compiler was unable to locate
the installed cholmod library. The location
of cholmod library is manually found and the
compiler is instructed to locate the missing library from this
custom location. As the cholmod library get
successfully located, another dependency library
(i.e. trimesh
library) goes missing.
sudo apt install libcholmod3 libcholmod3 is already the newest version cd /usr/include find . -iname 'cholmod.h' ./suitesparse/cholmod.h g++ -o mesh_opt -I/usr/include/suitesparse/ mesh_opt.cc mesh_opt.cc:16:10: fatal error: TriMesh.h: No such file or directory 16 | #include "TriMesh.h" | ^~~~~~~~~~~ compilation terminated.
The missing trimesh library is downloaded and as its successful compilation required installation of another library OpenGL Utility Toolkit (freeglut). The freeglut library was installed using the Debian package manager.
wget https://gfx.cs.princeton.edu/proj/trimesh2/src/trimesh2-2.16.zip unzip trimesh2-2.16.zip cd trimesh2/ make ../include/GL/freeglut_std.h:131:13: fatal error: GL/glu.h: No such file or directory 131 | # include| ^~~~~~~~~~ compilation terminated. sudo apt install freeglut3-dev make ... Compiling mesh_view.cc Linking ../bin.Linux64/mesh_view
The mesh_opt.cc source is recompiled after
installing all the required dependencies. The compilation
process breaks with the following error.
g++ -o mesh_opt \ -I/usr/include/suitesparse/ -I./trimesh2/include \ mesh_opt.cc mesh_opt.cc:93:23: error: ‘TriMesh’ was not declared in this scope; did you mean ‘trimesh::TriMesh’? 93 | static int isneighbor(TriMesh *mesh, int i, int j) { | ^~~~~~~ | trimesh::TriMesh
The CHANGELOG of trimesh2 library notifies of this breaking change as follows.
Changes version 2.11 → 2.12:
Major user-visible change: the library and headers are now contained
within the trimesh namespace. Existing code must be updated to refer
to the TriMesh class and associated functions with an explicit
namespace qualifier, or include the following declaration
using namespace trimesh;
This breaking change in the trimesh library can be
addressed by appending the following statement to
the mesh_opt.cc source code which instructs the
compiler to automatically use the trimesh name
for all references to the Trimesh data
structure.
using namespace trimesh;
Fixing Issue 2.1 reveals other two issues with
the mesh_opt.cc source. These issues are easier
to fix as they relate to compilers enforcing certain rules
more strictly as compared to tolerance offered by compilers 17
years ago.
g++ -o mesh_opt \ -I/usr/include/suitesparse/ -I./trimesh2/include \ mesh_opt.cc mesh_opt.cc: In function ‘void optimize_grid(trimesh::TriMesh*, float, float, const t_fc&)’: mesh_opt.cc:178:23: error: invalid conversion from ‘void (*)(int, char*, int, char*)’ to ‘void (*)(int, const char*, int, const char*)’ [-fpermissive] 178 | c.error_handler = handler; | ^~~~~~~ | | | void (*)(int, char*, int, char*) mesh_opt.cc: In function ‘void optimize_mesh(trimesh::TriMesh*, float, float)’: mesh_opt.cc:348:23: error: invalid conversion from ‘void (*)(int, char*, int, char*)’ to ‘void (*)(int, const char*, int, const char*)’ [-fpermissive] 348 | c.error_handler = handler; | ^~~~~~~ | | | void (*)(int, char*, int, char*) mesh_opt.cc:366:15: error: ‘vector’ does not name a type 366 | const vector<int> &af = mesh->adjacentfaces[v]; | ^~~~~~ mesh_opt.cc:367:18: error: ‘af’ was not declared in this scope; did you mean ‘nf’? 367 | int nf = af.size(); | ^~ | nf
The first issue of invalid conversion ... is
resolved by adding the qualifier const to the
function arguments as required by the CHOLMOD library's error
handler. The second issue related to vector is
fixed by adding std:: namespace to the
declaration of variable af of the STL
type vector.
...
// CHOLMOD error handler
//static void handler(int status, char *file, int line, char *message) {
static void handler(int status, const char *file, int line, const char *message) {
...
// Normal constraints
int row = nvars;
for (int v = 0; v < nvars; v++) {
// const vector &af = mesh->adjacentfaces[v];
const std::vector &af = mesh->adjacentfaces[v];
...
The compilation process completes successfully but the
linking stage breaks with a large number of undefined
reference to ... errors. These errors are fixed by
manually pointing the compiling to static library that contains
the definition of those symbols. These library names were
manually found by hit-and-trial method. The following command
leads to successful compilation and linking of
the mesh_opt.cc source file resulting in the
required mesh_opt executable.
g++ -o mesh_opt \
-I/usr/include/suitesparse/ \
-I./trimesh2/include \
-fopenmp mesh_opt.cc \
./trimesh2/lib.Linux64/libtrimesh.a \
/usr/lib/x86_64-linux-gnu/libcholmod.a \
/usr/lib/x86_64-linux-gnu/libblas.a \
/usr/lib/x86_64-linux-gnu/libsuitesparseconfig.a \
/usr/lib/x86_64-linux-gnu/libcolamd.a \
/usr/lib/x86_64-linux-gnu/libmetis.so.5 \
/usr/lib/x86_64-linux-gnu/libccolamd.a \
/usr/lib/x86_64-linux-gnu/libcamd.a \
/usr/lib/x86_64-linux-gnu/libamd.a
./mesh_opt
Usage: ./mesh_opt infile [options] [outfile]
Options:
-fc file.fc Range grid camera intrinsics (i.e. fx fy cx cy)
-lambda Geometry weight
-blambda Boundary geometry weight
-fixnorm s[:n] Fix normals by smoothing n times with sigma=s*edgelength
-smooth s[:n] Smooth positions n times with sigma=s*edgelength
-opt Run one optimization round
-noopt Do not optimize
-noconf Remove per-vertex confidence
-nogrid Unpack range grid to faces
Nehab et al. (2005) have not included a self-contained
and automated test to verify the functionality of
the mesh_opt executable program. This would have
allowed us to verify that the program is operating as was
expected by the developers during its release in the year
2005.
The authors have included detailed documentation about
the mesh_opt program as well as provided data
samples
and examples
(along with the expected output) to guide users in reproducing
the results presented in the research paper. To reproduce the visualisation included in the paper, the
samples
are downloaded and the mesh_opt program is executed
using exactly the same command line arguments provided in
the examples.
wget https://w3.impa.br/~diego/software/NehEtAl05/samples.tar.gz
tar -zxvf samples.tar.gz
./mesh_opt panel/panel-small.ply -fc panel/panel-small.fc -fixnorm 2:3 optimized.ply
Reading panel/panel-small.ply...
Reading 116408 vertices...
Reading range grid... Done.
Fixing normals (2:3)...
Triangulating... 230778 faces.
Removing faces... 479 faces removed... Done.
Computing point areas... Done.
Finding vertex neighbors... Done.
Smoothing normals... Done. Filtering took 0.062695 sec.
Smoothing normals... Done. Filtering took 0.055074 sec.
Smoothing normals... Done. Filtering took 0.051286 sec.
Computing normals... Done.
Smoothing normals... Done. Filtering took 0.053773 sec.
Smoothing normals... Done. Filtering took 0.054504 sec.
Smoothing normals... Done. Filtering took 0.052540 sec.
Done.
Range grid optimization...
Analyzing range grid... Done.
Building system (116408x349115)... Done.
Analyzing matrix... Done.
Factoring matrix... Done.
Back substituting... Done.
Updating range grid... Done.
Writing optimized.ply... Done.
mesh_opt on the
provided sample data.Apart from the execution time differences, the following
differences exists between the obtained output and expected
output of the mesh_opt command. The author's output
shows "Removing faces... 480 faces removed" while our output
shows "Removing faces... 479 faces removed". This suggests that
the mesh_opt program is behaving in slightly
different ways than what was observed by the authors in
2005. The generated output mesh looks visually similar as shown
below. However, it is not possible to confirm this as the
processed results (i.e. optimized.ply) are not
included in the provided dataset.
The compilation and testing (i.e. manual comparison of output) of code released by Nehab et al. (2005) provided some valuable insights into the factors that contribute to the intelligibility and maintainability of a research software.
The C++ code of Nehab et al. (2005) depends on the following two libraries: (a) trimesh2 and (b) CHOLMOD.
The trimesh2 library is still being maintained (last release
was in 2020) and therefore its compilation is straightforward after
its dependency (i.e. freeglut library) was installed using the
operating system's package manager. The trimesh2 library
introduced a breaking change in 2013 (i.e. in release version
2-2.12) which moved all the library assets to
the trimesh namespace in order to avoid naming
conflicts with other libraries or user code. This breaking
change required a minor update to the source code of Nehab et
al. (2005).
The original CHOLMOD library has now been integrated with other linear algebra libraries in the SuiteSparse and therefore resolving this dependency required some significant effort.
I first tried to compile this code in 2009 during my MSc and I don't recall facing any major issues during the compilation. My colleague, who was doing his PhD during that time, also had compiled this code successfully in his machine and used it for his doctoral research. So, I know that this research code compiled without much friction in 2009. However, when I re-compiled this research code in 2022, I faced many issues. Most of these issues were related to the changes in dependency libraries. These issues have also been faced by others. The waps101/MergePositionNormals project created a MATLAB implementation of the research code from Nehab et al. because the original C-language based implementation had become "more difficult to compile over the years due to required dependencies".
I do not know if there was a way, in 2005, to ship a research code like Nehab et al. (2005) that could have prevented dependency issues after 17 years. I will revisit this question in future when I gain a better understanding of the state of software packaging and distribution in the year 2005.
Compiler tolerate certain programming practices for various reasons. For example, the following statement exists in the source code of Nehab et al. (2005).
... const std::vector&g = mesh->grid; ... const vector &af = mesh->adjacentfaces[v]; ...
The existence of the statement const
vector<int> ... indicates that the authors must have
missed to include the std:: namespace prefix
because other similar declarations does include the required
prefix std::vector. This code must have compiled
successfully in 2005; possibly with a compiler warning
message. In the year 2022, compilers generate an error, instead
of a warning, thereby requiring an update to the source
code.
Compiler warnings will turn into error. Therefore, it is wise to heed to those warnings and update the code to address the issue in order to avoid breaking the code in the future.
While we were able to run the executable in a modern machine, it was not possible to verify if the software was operating in the way expected by the authors in 2005. We also noticed a minor difference in the observed output and expected output (as documented in the reference manual).
## 2022 (output obtained using modern hardware and software libraries) ... Removing faces... 479 faces removed... Done. ... ## 2005 ... Removing faces... 480 faces removed... Done. ...
This discrepancy in the output makes one wonder if there is something amiss in the software code or the dependency library. This situation could have been avoided, to some extent, by including a set of self-contained and automated tests which would have helped the user to narrow down the issue. This discrepany has revealed the contribution of a software test suite in building trust with its users.
Reproducing the results reported by Nehab et al. (2005) revealed that the code ran 45 times faster in a modern hardware and software platform.
## 2022 (output obtained using a modern hardware and software libraries)
...
Smoothing normals... Done. Filtering took 0.062695 sec.
...
## 2005
...
Smoothing normals... Done. Filtering took 2.496497 sec.
...
This observation suggests that code execution will be faster in future platforms just because of improvements in hardware and software libraries. It also encourages developers to resist the temptation to optimise their code in order to save a few CPU cycles during execution. Such optimised code are often difficult to understand, maintain and fix. Therefore, it is wiser to write code that are easier to understand and runs at acceptable speed as compared to highly optimised code that runs faster but is difficult to understand and maintain.
The downloaded archive file contains a C++ source code, an
executable file and some sample data. I missed a readme.txt file
in the archive which usually provides details about compiling
the source code, running the executable, expected output,
description of the provided sample data, etc. After some
exploration,
the reference
manual page and examples page were
discovered online. These pages are well written and proved
extremely useful in understanding the process of reproducing
the results reported in the paper. It would have been easier to
locate if these HTML documents were included inside
a doc folder of the downloaded archive file. In
retrospect, the main website of Nehab et al. (2005) clearly
shows a documentation section containing links to the
reference manual and examples. For some reason, I downloaded
the source and started to look inside the archive for the
required documentation.
The code and documentation of Nehab et al. (2005) was easy to understand, interesting to read and easy to fix. It was remarkable to see a code written 17 years ago becoming alive and running 45 times faster in a modern hardware and software platform. The reference manual and examples page are clear and easy to understand. The source code is well documentation and intelligible.