We have now compiled the sqlite-2002 (i.e. version 2.2.1) version released in Jan. 2022 successfully in a modern platform. The sqlite-2002 software was not designed to run on future platforms; the developers of SQLite could not be foreseen how the hardware and software would change in 20 years. Therefore, although the software has compiled successfully in a modern platform, we do not still know if the software can function in the way that its developers had expected during its released in 2002. How can we ensure that the software is behaving in the way it was desired in 2002?
The sqlite-2002 documentation states that this version of sqlite, "implements a large subset of SQL92." and allows "atomic commit and rollback protect data integrity". We could prepare some SQL query statements based on SQL92 and prepare some SQL tables to test atomic commits. However, will we able to develop trust in this software based on such simple mock tests? What would it take to confirm that the software is truly behaving in the way it was designed to behave before its release in 2002? The developers of sqlite-2002 were best placed to know the desired and expected behaviour of this version of SQLite. We can develop a good level of trust if the software passes all the tests that were prepared by the developers. There is no other simple and quick way to develop such a level of trust in this software.
How do we run the tests included with the sqlite-2002
source distribution? The
README
file contained in the sqlite-2002 source distribution was
sufficient to compile the source code in a Unix-like
platform. However, the README file does not contain
instructions about running the tests. So, we check if the
documentation that gets shipped with the sqlite-2002 source
can be compiled using the common make doc
command.
$ make doc
...
tclsh ../sqlite/www/index.tcl `cat ../sqlite/VERSION` >index.html
tclsh ../sqlite/www/sqlite.tcl >sqlite.html
...
tclsh ../sqlite/www/faq.tcl >faq.html
mkdir -p doc
mv index.html sqlite.html changes.html lang.html opcode.html arch.html
arch.png vdbe.html c_interface.html crosscompile.html mingw.html
tclsqlite.html download.html speed.html faq.html doc
This command generates all the documentation in HTML format
which can be viewed in a web
browser. The doc/index.html is the entry point of
the documentation and a snippet of this file is shown in the
screenshot below.
doc/index.html file from sqlite-2.2.1
source distribution.It is only in the generated document that we see that
regression tests can be executed using the make
test command which is still fairly a standard way to
execute tests. We expect to see a message like "All tests
passed". However, if that were the case, we would have less
learning opportunity. Let us see what learning opportunities
emerge as we engage with the tests included in the sqlite-2002
source distribution.
As advised in the generated documentation, we execute the
regression tests using the make test command and
encounter our first issue with the test suite.
$ make test
../sqlite/src/test1.c:19:10: fatal error: tcl.h: No such file or directory
19 | #include "tcl.h"
| ^~~~~~~
compilation terminated.
make: *** [Makefile:224: testfixture] Error 1
SQLite test suite depends on the Tcl library. At the time of writing of this article, tcl-8.6.12 is the most recent released version of Tcl. We download and compile tcl-8.6.12 with a hope that it will let us run the SQLite test suite.
## Compile tcl-8.6.12 from source
## Note: you can skip this step if your already have Tcl installed
mkdir -p $HOME/sr/sqlite/2.2.1/deps/src
cd $HOME/sr/sqlite/2.2.1/deps/src
wget https://prdownloads.sourceforge.net/tcl/tcl8.6.12-src.tar.gz
tar -zxvf tcl8.6.12-src.tar.gz
cd tcl8.6.12/unix # not standard, had to look into http://www.tcl.tk/doc/howto/compile.html
./configure --prefix=$HOME/sr/sqlite/2.2.1/deps/ && make -j && make install
## Re-compile sqlite-2.2.1 (with Tcl dependencies)
cd $HOME/sr/sqlite/2.2.1/build
rm * # clear existing build files
../sqlite/configure
...
checking for library containing Tcl_Init... no
checking TCL header files... not specified: still searching...
checking for tcl.h... no
...
The configure script cannot find the tcl-8.6.12 library even
after we use the --includedir
and --libdir flag. The LDFLAGS
and CFLAGS environment variables also do not
help. So, we manually edit the generated Makefile
and include the location of our version of compiled tcl
library.
## Update Makefile to get Tcl library from custom location
sed -i '/TCL_FLAGS = -DNO_TCL=1/c\TCL_FLAGS = -I${HOME}/sr/sqlite/2.2.1/deps/include' Makefile
sed -i '/LIBTCL = -ldl -lm/c\LIBTCL = ${HOME}/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so -ldl -lm' Makefile
We try to recompile and rebuild the test as follows.
$ make -j && make test ... ../sqlite/src/tclsqlite.c:622:36: error: 'Tcl_Interp' has no member named 'result' 622 | if( zInfo==0 ) zInfo = interp->result; | ^~ make: *** [Makefile:226: testfixture] Error 1
How can we fix this issue?
The error message indicates that the Tcl library has introduced breaking
changes because of which the result field is not
available in the Tcl_Interp data structure. We
look into the Tcl_Interp
API documentation which describes this breaking change and a
possible solution for legacy programs.
For legacy programs and extensions no longer being maintained, compiles against the Tcl 8.6 header files are only possible with the compiler directives#define USE_INTERP_RESULT #define USE_INTERP_ERRORLINEdepending on which fields of the Tcl_Interp struct are accessed.
—Tcl_Interp, Tcl C API
We update the Makefile to include the required
preprocessor directive to allow access to
the result field.
$ sed -i '/TCL_FLAGS = -I${HOME}\/sr\/sqlite\/2.2.1\/deps\/include/c\TCL_FLAGS = -I${HOME}\/sr\/sqlite\/2.2.1\/deps\/include -DUSE_INTERP_RESULT=1' Makefile $ cat $HOME/sr/sqlite/2.2.1/build/Makefile ... TCL_FLAGS = -I${HOME}/sr/sqlite/2.2.1/deps/include -DUSE_INTERP_RESULT=1 ...
This fix lets the test suite to use the Tcl library. As we will see in the next issue, this fix is not sufficient to compile and run the tests.
extern qualifierWe recompile the sqlite-2002 tests after fixing the issue with the Tcl library and encounter our next issue.
$ cd $HOME/sr/sqlite/2.2.1/build && make -j && make test ... /usr/bin/ld: sqlite/src/pager.h:69: multiple definition of 'pager_refinfo_enable' collect2: error: ld returned 1 exit status make: *** [Makefile:224: testfixture] Error 1 $ grep -rnw 'pager_refinfo_enable' $HOME/sr/sqlite/2.2.1/sqlite/src/ src/pager.c:160: int pager_refinfo_enable = 0; src/pager.c:163: if( !pager_refinfo_enable ) return; src/pager.h:69:int pager_refinfo_enable; src/test3.c:837: Tcl_LinkVar(interp, "pager_refinfo_enable", (char*)&pager_refinfo_enable,
The re-declaration of this
variable in src/pager.h#69 shows that the
developers wanted to say to the compiler that this variable is
being used here but it has already been declared and defined
somewhere else. A compiler directive for such variables
is extern. This issue can be fixed by declaring the
concerned variable as extern page_refinfo_enable
in src/pager.h#69 which helps the compiler resolve
the variable's definition and declaration
from src/pager.c#160 This error did not show up in
the Build stage because
the SQLITE_TEST preprocessor directive did not get
defined because of missing Tcl dependency which is essential for
the sqlite test.
The issue can be fixed by declaring the variable
as extern in src/pager.h#69 which
informs the compiler to search for its original declaration
and definition in some other source file.
cd $HOME/sr/sqlite/2.2.1/sqlite/src/
sed -i '/int pager_refinfo_enable;/c\extern int pager_refinfo_enable;' pager.h
cat pager.h
...
#ifdef SQLITE_TEST
void sqlitepager_refdump(Pager*);
extern int pager_refinfo_enable;
#endif
This fix is contained in a patch and can be applied to the source as follows.
# Assumptions: Issue 2.1, 2.2 are already fixed
cd $HOME/sr/sqlite/2.2.1/sqlite
patch -p1 < $HOME/sr/sqlite/2.2.1/patches/3-extern.patch
The patching process is confirmed by the following message.
patching file src/pager.h
This fix takes allows the tests to compile and run.
We are now ready to compile and run the tests. While many
initial test cases run successfully, the test
case btree-1.5 fails with a Segmentation
fault (or SEGFAULT) error.
$ cd $HOME/sr/sqlite/2.2.1/build && make -j && make test ... creating sqlite ./testfixture ../sqlite/test/quick.test bigrow-1.0... Ok bigrow-1.1... Ok bigrow-1.2... Ok bigrow-1.3... Ok ... bigrow-5.3... Ok bigrow-5.4... Ok btree-1.1... Ok btree-1.1.1... Ok btree-1.2... Ok btree-1.3... Ok btree-1.4... Ok btree-1.4.1... Ok btree-1.5...make: *** [Makefile:232: test] Segmentation fault
The debugging process for the Issue 3.3 was arduous and frustrating. If I had used the GNU gdb debugger, it would have been simpler as shown in the software resurrection paper (pdf). Abhishek Dutta (2022-09-15)
SEGFAULT errors are caused by
programs trying to access a memory location that is not
allowed to access. The program code that is
triggering SEGFAULT can be located using a
debugger's backtrace functionality which shows a detailed
history of the program instructions that led to
the SEGFAULT (e.g. the backtrace or
(bt) command of the gdb
debugger). Inspecting the output of the make test
command, we observe that tests are invoked by the following
command: testfixture
../sqlite/test/quick.test. The testfixture
appears as an executable which takes an argument that
corresponds to a file containing all the test case
description. Let us find more about this file using
the file command.
$ cd $HOME/sr/sqlite/2.2.1/build $ gdb --args testfixture ../sqlite/test/quick.test not in executable format: file format not recognised $ file ./testfixture ./testfixture: POSIX shell script, ASCII text executable, with very long lines
Contrary to our expectation, the testfixtute is
a shell script and therefore it is not possible to use
the gdb debugger to trace the source of
the SEGFAULT error. Since it is a plain text file,
we open it in a text editor which reveals that this script is
"just a wrapper for" another executable program
named lt-testfixture. The build folder does not
contain the lt-testfixture executable. Where could
it be located? The find . -name lt-testfixture
reveals that this file is located inside the
hidden build/.libs folder. After a second look at the
contents of testfixture script, it is clear that
this information could have been gleaned from
the progdir variable. In the first look, for some
reason, this failed to register.
$ cat $HOME/sr/sqlite/2.2.1/build/testfixture
...
program=lt-'testfixture'
progdir="$thisdir/.libs"
...
if test -f "$progdir/$program"; then
...
exec $program ${1+"$@"}
...
else
# The program doesn't exist.
$echo "$0: error: $progdir/$program does not exist" 1>&2
$echo "This script is just a wrapper for $program." 1>&2
echo "See the libtool documentation for more information." 1>&2
exit 1
fi
Now we shift our attention to
the .libs/lt-testfixture. Using a combination
of file and ldd commands, we know that
this is an executable that is dynamically linked to the Tcl and
sqlite library. Therefore, this must be the main executable that
is responsible for running all the test cases defined
in test/quick.test.
$ file .libs/lt-testfixture
.libs/lt-testfixture: ELF 64-bit LSB pie executable, x86-64
...
$ ldd ./libs/lt-testfixture
...
/home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
libsqlite.so.0 => /home/tlm/sr/sqlite/2.2.1/build/.libs/libsqlite.so.0
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
...
There are two threads of investigation at this point. First
is to understand the functionality of lt-testfixture
by studying its source code. Second is to understand the test
cases specifications in test/quick.test.
What source code generates the lt-testfixture
executable file? The source folder does not contain any file
named lt-testfixture.*. The Makefile
also does not provide any clues to how this executable gets
generated. For now, we give up this line of inquiry as it does
not seem to leading anywhere.
$ cd $HOME/sr/sqlite/2.2.1/build/ $ find . -iname '*lt-testfixture*' {output is empty} $ find . -iname '*testfixture*' {output is empty} $ grep -rnw 'lt-testfixture' . {output is empty} $ grep -rnw 'testfixture' . {output is empty}
We pursue the second line of inquiry by exploring the test
case specifications defined in test/quick.test
file. The contents of this file appears to be a program code but
the programming language does not look familiar. We use a
internet search engine with the following query "set rename proc
set foreach" and conclude that this code is written in the Tcl
programming language. Since, I was not familiar with the Tcl
language, it was not initially clear
that test/quick.test is written in the Tcl
language. In spite of unfamiliarity, the remarkable property of
the intelligible keywords chosen by the Tcl language ensured
that I could understand the computations defined in this file:
collect all test specification defined
in test/*.test and execute them using the Tcl
interpreter. Since, our test execution failed
at btree-1.5, we continue our investigation by
looking into the test/btree.test.
# 2001 September 15
# ...
# This file implements regression tests for SQLite library. The
# focus of this script is btree database backend
...
...
...
# Do an insert and verify that the database file grows in size.
#
do_test btree-1.4 {
set rc [catch {btree_begin_transaction $::b1} msg]
lappend rc $msg
} {0 {}}
do_test btree-1.4.1 {
lindex [btree_pager_stats $::b1] 1
} {1}
do_test btree-1.5 {
set rc [catch {btree_cursor $::b1 2 1} ::c1]
if {$rc} {lappend rc $::c1}
set rc
} {0}
...
Let us pause our investigation for a moment to enjoy the
intelligibility of this program code. The comment at the
beginning of the source file identifies the purpose of this
program code in a clear and concise way. It also conveys the
date when this program code was created which helps us
appreciate that this was written more than 20 years ago. I am
not familiar with the Tcl programming language yet I can
understand many aspects of this test code. First, I see a set
of do_test instructions which suggests that this
program is executing a set of tests. Second, we notice a set
of sequential (e.g. btree-1.1, btree-1.2, btree-1.3,
btree-1.4) and hierarchical (e.g. btree-1.4.1)
which implies, without any requiring any additional
qualifications, the following:
Returning to our investigation, we realise that we still do
not know the following two details: (a) what computations are
carried out by when do_test gets invoked? and (b)
what computations are contained in these individual tests
like btree-1.5? First, let us find more details
about the do_test by listing all the occurrences
of this keyword in the source folder.
$ cd $HOME/sr/sqlite/2.2.1/sqlite/ && grep -rnw 'do_test' .
...
./test/tester.tcl:62:# Invoke the do_test procedure to run a single test
./test/tester.tcl:64:proc do_test {name cmd expected} {
...
If we patiently scroll through all the results, we will see
that the do_test is a procedure (or function)
defined
in test/tester.tcl. Because
of the judicious choice of programming language keyword
(e.g. proc) and identifier names
(e.g. name, cmd, expected), we now know
that do_test is a procedure (or function) which
executes the command contained in cmd argument and
compares the expected outcome. If the result does not
match the expectation, the do_test procedure
prints an informative error message.
...
# Invoke the do_test procedure to run a single test
#
proc do_test {name cmd expected} {
global argv nErr nTest skip_test
...
if {[catch {uplevel #0 "$cmd;\n"} result]} {
puts "\nError: $result"
incr nErr
lappend ::failList $name
if {$nErr>10} {puts "*** Giving up..."; finalize_testing}
} elseif {[string compare $result $expected]} {
puts "\nExpected: \[$expected\]\n Got: \[$result\]"
incr nErr
lappend ::failList $name
if {$nErr>10} {puts "*** Giving up..."; finalize_testing}
} else {
puts " Ok"
}
}
...
We now move to the second question of finding out the
computations carried out in individual tests like btree-1.5. We
now realise that the usage of
do_test in test/btree.test not only
invoked the tests but also named the individual tests and
defined the commands that constitutes a test. So,
the btree-1.5 involves executing the following
instructions in the Tcl language,
do_test btree-1.5 {
set rc [catch {btree_cursor $::b1 2 1} ::c1]
if {$rc} {lappend rc $::c1}
set rc
} {0}
where the expected output from this test
is 0. Due to unfamiliarity with the Tcl language,
it is difficult to understand the instructions involved in
this test. So, we start with the following questions and refer
to the Tcl documentation to understand the purpose of these
keywords and find the answer to the following two
questions.
$::b1 variable?To answer the first question, we use the grep command to learn
that btree_cursor is declared as a Tcl
command and defined as a C-language function
src/test3.c
$ cd $HOME/sr/sqlite/2.2.1/sqlite/ && grep -rnw
'btree_cursor' .
./src/test3.c:502:** Usage: btree_cursor ID TABLENUM WRITEABLE
./src/test3.c:506:static int btree_cursor(
./src/test3.c:540:** Close a cursor opened using btree_cursor.
./src/test3.c:827: Tcl_CreateCommand(interp, "btree_cursor", btree_cursor, 0, 0);
...
/*
** Usage: btree_cursor ID TABLENUM WRITEABLE
**
** Create a new cursor. Return the ID for the cursor.
*/
static int btree_cursor(
void *NotUsed,
Tcl_Interp *interp, /* The TCL interpreter that invoked this command */
int argc, /* Number of arguments */
char **argv /* Text of each argument */
){
Btree *pBt;
int iTable;
BtCursor *pCur;
int rc;
int wrFlag;
char zBuf[30];
if( argc!=4 ){
...
return TCL_ERROR;
}
if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
if( Tcl_GetInt(interp, argv[2], &iTable) ) return TCL_ERROR;
if( Tcl_GetBoolean(interp, argv[3], &wrFlag) ) return TCL_ERROR;
rc = sqliteBtreeCursor(pBt, iTable, wrFlag, &pCur);
if( rc ){
Tcl_AppendResult(interp, errorName(rc), 0);
return TCL_ERROR;
}
sprintf(zBuf,"0x%x", (int)pCur);
Tcl_AppendResult(interp, zBuf, 0);
return SQLITE_OK;
}
To answer the second question (i.e. the purpose
of $::b1), we use our prior knowledge which
informs that the :: operator that allows access
to global variable in the C programming language. It is highly
likely that Tcl also has same interpretation of this
keyword. We use a search engine to find out the way to print
value of a variable. From
the Tcl
documentation, we learn that the puts command
can be used to print values of these variables. Therefore, we
update the test/btree.test file as follows to
print the value of these $::b1 and
::c1variables.
...
do_test btree-1.5 {
puts "DEBUG: Showing values of btree-1.5 variables"
puts "Value of b1={$::b1}"
set rc [catch {btree_cursor $::b1 2 1} ::c1]
puts "Value of rc={rc}"
if {$rc} {lappend rc $::c1}
set rc
} {0}
...
Now when we re-run the make test command to
run the regression test, we get the following surprising
result.
$ cd $HOME/sr/sqlite/2.2.1/build && make test
...
btree-1.5...DEBUG: Showing values of btree-1.5 variables
Value of b1={0x90908db0}
make: *** [Makefile:232: test] Segmentation fault
There are two important things to note here. First is
that b1 is indeed a global variable and that
it is storing some hexadecimal value as a string. Second,
we do not see the output of the form Value of
rc=... which implies that the SEGFAULT is being
caused by the btree_cursor command. What is
the purpose of b1 global variable?
The test/btree.test shows that
the b1 global variable is set by the btree-1.1
test case which involves opening a btree.
...
do_test btree-1.1 {
file delete -force test1.bt
file delete -force test1.bt-journal
set rc [catch {btree_open test1.bt} ::b1]
} {0}
...
Let us see what is returned by the btree_open
method defined in src/test3.c.
/*
** Usage: btree_open FILENAME
**
** Open a new database
*/
static int btree_open(
void *NotUsed,
Tcl_Interp *interp, /* The TCL interpreter that invoked this command */
int argc, /* Number of arguments */
char **argv /* Text of each argument */
){
Btree *pBt;
int rc;
char zBuf[100];
if( argc!=2 ){
...
return TCL_ERROR;
}
rc = sqliteBtreeOpen(argv[1], 0666, 10, &pBt);
if( rc!=SQLITE_OK ){
Tcl_AppendResult(interp, errorName(rc), 0);
return TCL_ERROR;
}
sprintf(zBuf,"0x%x",(int)pBt);
Tcl_AppendResult(interp, zBuf, 0);
return TCL_OK;
}
This reveals that the btree_open command creates
a new database and returns the pointer address of the database
as a string. The string representation of the pointer address is
created using the %x format specifier of
the sprintf function. We refer to the manual page
of sprintf and find out the description of %x
specifier.
$ man sprintf
PRINTF(3) Linux Programmer's Manual
NAME
printf, fprintf, dprintf, sprintf, snprintf, vprintf, vfprintf,
vdprintf, vsprintf, vsnprintf - formatted output conversion
...
Conversion specifiers
A character that specifies the type of conversion to be
applied. The conversion specifiers and their meanings are:
...
x, X The unsigned int argument is converted to unsigned
hexadecimal (x and X) notation. ...
...
p The void * pointer argument is printed in hexadecimal
(as if by %#x or %#lx).
This is quite revealing. The pointer address is being cast
to 4 byte integer using (int)pBt because
the %x specifier expects an unsigned
int argument. Doesn't a pointer address in 64 bit
systems require 8 bytes of storage? It is possible that
the sprintf() based code is only storing the
lower 32 bits of the pointer address and discarding the higher
32 bits of the memory address thereby resulting in
segmentation fault. Let us verify this hypothesis by
using the %p format specifier which is
specifically meant for converting a pointer address to string
representation. We update the src/test3.c to
return the string representation of pointer address using
both %x and %p specifiers.
static int btree_open(
...
//sprintf(zBuf,"0x%x",(int)pBt);
sprintf(zBuf,"\nOld=0x%x \nNew=%p",(int)pBt, pBt);
...
}
Let us build the test code again and see if our hypothesis is correct.
$ cd $HOME/sr/sqlite/2.2.1/build && make test
...
btree-1.5...DEBUG: Showing values of btree-1.5 variables
Value of b1={
Old= 0x85b31db0
New=0x558c85b31db0}
...
Should we should jump like a scientist who has suddenly had a
Revelation? Amongst all the noisy output created by our debug
statements, we can see what we had been anticipating. Our
hypothesis was correct and we have found the reason for the
segmentation fault. The %x format specifier
in sprintf() function was indeed dropping the
higher 32 bits of the pointer address and only passing the
lower 32 bits of the pointer address corresponding to the newly
created btree database. All the subsequent tests which tried to
access the btree database using this pointer address caused the
segmentation fault error. We now work to fix the code such
that all the tests run successfully.
We observe that there are two occurrences of 0x%x
in src/test3.c. We replace them
with %p as follows.
$ cat $HOME/sr/sqlite/2.2.1/sqlite/src/test3.c
static int btree_open(...)
{
...
//sprintf(zBuf,"0x%x",(int)pBt);
sprintf(zBuf,"%p",pBt);
...
}
static int btree_cursor(...)
{
...
//sprintf(zBuf,"0x%x", (int)pCur);
sprintf(zBuf,"%p", pCur);
...
}
We compile the test again.
$ cd $HOME/sr/sqlite/2.2.1/build && make test
...
bigrow-5.3... Ok
bigrow-5.4... Ok
btree-1.1... Ok
btree-1.1.1...
Error: integer value too large to represent as non-long integer
btree-1.2... Ok
btree-1.3...
Expected: [0 {}]
Got: [1 {integer value too large to represent as non-long integer}]
...
The error message is produced by the do_test
procedure defined in test/tester.tcl. This implies
that the test btree-1.1.1 is executing some
instruction that is causing overflow of integer conversion
operation. From test/btree.test, we note
that btree-1.1.1 test case triggers
the btree_pager_stats.
...
do_test btree-1.1.1 {
lindex [btree_pager_stats $::b1] 1
} {0}
...
What are the instructions being executed by
the btree_pager_stats function? Let us at this
function's definition in the src/test3.c
file.
...
static int btree_pager_stats(
void *NotUsed,
Tcl_Interp *interp, /* The TCL interpreter that invoked this command */
int argc, /* Number of arguments */
char **argv /* Text of each argument */
){
Btree *pBt;
int i;
int *a;
if( argc!=2 ){
...
return TCL_ERROR;
}
if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
a = sqlitepager_stats(sqliteBtreePager(pBt));
for(i=0; i<9; i++){
static char *zName[] = {
"ref", "page", "max", "size", "state", "err",
"hit", "miss", "ovfl",
};
char zBuf[100];
Tcl_AppendElement(interp, zName[i]);
sprintf(zBuf,"%d",a[i]);
Tcl_AppendElement(interp, zBuf);
}
return TCL_OK;
}
...
The Tcl_GetInt() function is trying to convert
the 64 bit pointer address passed as a string argument to 32 bit
integer representation and therefore causing the overflow
error. We can fix this using the standard
library's strtol() function.
...
static int btree_pager_stats(...)
{
...
//if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
pBt = strtol(argv[1], NULL, 16);
if(!pBt) return TCL_ERROR;
a = sqlitepager_stats(sqliteBtreePager(pBt));
...
}
...
We run the tests again.
$ cd $HOME/sr/sqlite/2.2.1/build && make test
...
bigrow-5.3... Ok
bigrow-5.4... Ok
btree-1.1... Ok
btree-1.1.1... Ok
btree-1.2... Ok
btree-1.3...
Expected: [0 {}]
Got: [1 {integer value too large to represent as non-long integer}]
btree-1.4...
Expected: [0 {}]
Got: [1 {integer value too large to represent as non-long integer}]
...
The btree-1.1.1 test has passed, so
had btree-1.2. This shows that we are on the right
track. Let us fix the issue with btree-1.3 in the
same way as we fixed the btree-1.1.1 test. For
this, we have to fix a similar integer overflow error
in btree_close() function defined in src/test3.c.
...
static int btree_close(...)
{
...
//if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
pBt = strtol(argv[1], NULL, 16);
if(!pBt) return TCL_ERROR;
rc = sqliteBtreeClose(pBt);
...
}
...
We run the tests again and note that btree-1.3
passes while the test fails with btree-1.4. We now
know that this issue needs to be fixed in the following
places.
btree_begin_transaction() in src/test3.cbtree_cursor()btree_insert()btree_key()btree_data()btree_close_cursor()btree_commit()btree_move_to()btree_next()btree_delete()btree_get_meta()btree_update_meta()btree_rollback()btree_create_table()btree_drop_table()btree_cursor_dump()btree_sanity_check()btree_close()We fix the pBt and pCur issues
in the functions shown above using the strtol()
function as shown below.
...
static int btree_insert()
{
...
//if( Tcl_GetInt(interp, argv[1], (int*)&pCur) ) return TCL_ERROR;
pCur = strtol(argv[1], NULL, 16);
if(!pCur) return TCL_ERROR;
rc = sqliteBtreeInsert(pCur, ...)
...
}
...
We run the tests again and encounter another segmentation fault error again.
$ cd $HOME/sr/sqlite/2.2.1/build && make
...
btree-8.19... Ok
btree-8.20... Ok
btree-8.21... Ok
btree-8.22... Ok
btree-8.23... Ok
btree-8.24... Ok
btree-9.1... Ok
btree-9.2...make: *** [Makefile:232: test] Segmentation fault
Let us inspect btree-9.2 test and find out the cause of
segmentation fault. The btree-9.2 test case
invokes the following functions: btree_insert(),
select_keys(). We have already fixed
the btree_insert(). The select_keys()
procedure invokes the following
functions: btree_move_to(), btree_key(), btree_next(). We have
already fixed these two functions. So what is causing the
segmentation fault? The btree_insert is causing
segmentation fault. For the btree-9.1 test, why does
the btree_insert() function get invoked with
garbage data (e.g. "This is a long prefix ...")? This results
in the loop only iterating through 001 to 018. The 019
iteration gets to run in btree-9.2 test. Why such a mix up is
happening? It seems that the test case of 8.14 is spilling over
to 9.1 test case.
It appears that the overflow starts in btree-8.5. In the
test case btree-8.5, long sequence of characters (e.g. "*** This
is an even longer key *** This is an even ....") is inserted
against the key "020". This insertion does not seem to be
successful as it invokes the btree_insert()
function two times. There appears to be a lag in the value
printed inside the src/test3.c::btree_insert() C
function and the messages printed inside
the test/btree.test::do_test invocations.
It is important to do these in steps. It may be tempting to
use grep and sed tools to
automatically replace each occurrences. This does not help as it
also replaces other parts which are not essential for the test
and therefore may result in unintended results. After fixing
btree-2.2, you will notice that more tests start to pass with
just one fix as some common functions are used in most
tests.The mix up of error message seems to have been
caused by a missing fflush(stdout).
I suddenly realised that I can use gdb for debugging using
cd $HOME/sr/sqlite/2.2.1/build
cat Makefile > Makefile-release # create a backup
sed -i 's/-g -O2/-ggdb -O0/' Makefile
make clean && make && make test
gdb .libs/lt-testfixture
(gdb) r ../sqlite/test/quick.test
...
btree-8.22... Ok
btree-8.23... Ok
btree-8.24... Ok
btree-9.1... Ok
btree-9.2...
Program received signal SIGSEGV, Segmentation fault.
0x0000555555557711 in cellSize (pCell=0x5555ffffbbc8) at ../sqlite/src/btree.c:334
334 int n = NKEY(pCell->h) + NDATA(pCell->h);
(gdb) bt
#0 0x0000555555557711 in cellSize (pCell=0x5555ffffbbc8) at ../sqlite/src/btree.c:334
#1 0x000055555555acc0 in balance (pBt=0x555555639a70, pPage=0x55555563a740, pCur=0x5555556399f0) at ../sqlite/src/btree.c:1958
#2 0x000055555555b7ec in sqliteBtreeInsert (pCur=0x5555556399f0, pKey=0x5555555dd730, nKey=3, pData=0x555555601f30, nData=35) at ../sqlite/src/btree.c:2144
#3 0x00005555555628cb in btree_insert (NotUsed=0x0, interp=0x5555555870c0, argc=4, argv=0x55555558b9e0) at ../sqlite/src/test3.c:680
#4 0x00007ffff7e50695 in TclInvokeStringCommand () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#5 0x00007ffff7e553f2 in TclNRRunCallbacks () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#6 0x00007ffff7e570d3 in TclEvalEx () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#7 0x00007ffff7f0b9c9 in Tcl_FSEvalFileEx () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#8 0x00007ffff7f0bb48 in Tcl_EvalFile () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#9 0x0000555555565563 in main (argc=2, argv=0x7fffffffe008) at ../sqlite/src/tclsqlite.c:620
The gdb log shows that the SEGFAULT is triggered
by cellSize() which in turn is triggered
by balance(). We notice that before
the cellSize() method is invoked
in balance() method, a copy of pages is created
using the copyPage() function which is defined as
follows.
...
static void copyPage(MemPage *pTo, MemPage *pFrom){
uptr from, to;
...
to = Addr(pTo);
from = Addr(pFrom);
for(i=0; i<pTo->nCell; i++){
uptr x = Addr(pFrom->apCell[i]);
...
}
}
...
The Addr() method is being used to get the
pointer address of source and destination pages. This raises a
suspicion that the Addr() method may only be using
the lower 32 bits of the pointer address. We look into
the src/sqliteInt.h file where this method is
defined.
...
/*
** Integers of known sizes. These typedefs might change for architectures
** where the sizes very. Preprocessor macros are available so that the
** types can be conveniently redefined at compile-type. Like this:
**
** cc '-DUINTPTR_TYPE=long long int' ...
*/
#ifndef UINT32_TYPE
# define UINT32_TYPE unsigned int
#endif
#ifndef UINT16_TYPE
# define UINT16_TYPE unsigned short int
#endif
#ifndef UINT8_TYPE
# define UINT8_TYPE unsigned char
#endif
#ifndef INTPTR_TYPE
# define INTPTR_TYPE int
#endif
typedef UINT32_TYPE u32; /* 4-byte unsigned integer */
typedef UINT16_TYPE u16; /* 2-byte unsigned integer */
typedef UINT8_TYPE u8; /* 1-byte unsigned integer */
typedef INTPTR_TYPE ptr; /* Big enough to hold a pointer */
typedef unsigned INTPTR_TYPE uptr; /* Big enough to hold a pointer */
/*
** This macro casts a pointer to an integer. Useful for doing
** pointer arithmetic.
*/
#define Addr(X) ((uptr)X)
...
This provides a lot of relief as we know that we are near to
the possible fix of this issue. The Addr() is a
macro that casts the value to an uptr type which
in turn is of unsigned int type. Therefore, we
now know the cause of the SEGFAULT issue. The 64 bit pointer addresses
corresponding to different pages were wrongly converted to 32
bit unsigned intstorage type. We correct this issue
as follows.
...
#ifndef INTPTR_TYPE
//# define INTPTR_TYPE int
# define INTPTR_TYPE long
#endif
...
After applying this fix, we recompile the tests.
$ cd $HOME/sr/sqlite/2.2.1/build && make
...
btree-12.11... Ok
btree-12.12... Ok
btree-13.1...
Error: integer value too large to represent as non-long integer
btree-15.98... Ok
btree-15.99... Ok
btree-99.1... Ok
lt-testfixture: integer value too large to represent as non-long integer
...
make: *** [Makefile:232: test] Error 1
We apply similar fixes.
...
static int btree_sanity_check(...)
{
...
//if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
pBt = strtol(argv[1], NULL, 16);
if(!pBt) return TCL_ERROR;
...
for(i=0; i<argc-2; i++){
//if( Tcl_GetInt(interp, argv[i+2], &aRoot[i]) ) return TCL_ERROR;
aRoot[i] = strtol(argv[i+2], NULL, 16);
if(!aRoot[i]) return TCL_ERROR;
}
...
}
...
static int btree_clear_table(...)
{
...
//if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
pBt = strtol(argv[1], NULL, 16);
if(!pBt) return TCL_ERROR;
}
...
All the btree tests now pass. Furthermore, other subsequent tests (e.g. btree3, copy, delete, expr, func, in, index, insert, ...) also pass except the pager tests.
...
misc1-2.1... Ok
misc1-2.2... Ok
pager-1.0... Ok
pager-1.1...make: *** [Makefile:232: test] Segmentation fault
We are now pretty confident that this issue is also related to pointer address conversion. Let us load the tests again in gdb and use the backtrace command to find the source of SEGFAULT.
cd $HOME/sr/sqlite/2.2.1/build make clean && make && make test gdb .libs/lt-testfixture (gdb) r ../sqlite/test/quick.test ... misc1-1.10... Ok misc1-1.11... Ok misc1-2.1... Ok misc1-2.2... Ok pager-1.0... Ok pager-1.1... Program received signal SIGSEGV, Segmentation fault. 0x000055555555f907 in sqlitepager_stats (pPager=0x7fff55624440) at ../sqlite/src/pager.c:1096 1096 a[0] = pPager->nRef; (gdb) bt #0 0x000055555555f907 in sqlitepager_stats (pPager=0x7fff55624440) at ../sqlite/src/pager.c:1096 #1 0x0000555555560a17 in pager_stats (NotUsed=0x0, interp=0x5555555870c0, argc=2, argv=0x55555558b9c0) at ../sqlite/src/test2.c:181 #2 0x00007ffff7e50695 in TclInvokeStringCommand () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so #3 0x00007ffff7e553f2 in TclNRRunCallbacks () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so #4 0x00007ffff7e570d3 in TclEvalEx () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so #5 0x00007ffff7f0b9c9 in Tcl_FSEvalFileEx () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so #6 0x00007ffff7f0bb48 in Tcl_EvalFile () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so #7 0x00005555555655a4 in main (argc=2, argv=0x7fffffffdff8) at ../sqlite/src/tclsqlite.c:620 (gdb)
This leads us to the src/test2.c source which
also includes the usage of Tcl_GetInt() to convert
string representation of pointer address to integer. We fix
this also in the same way as before.
...
static int pager_open(...)
{
Pager *pPager;
int nPage;
...
//sprintf(zBuf,"0x%x",(int)pPager);
sprintf(zBuf,"%p",pPager);
Tcl_AppendResult(interp, zBuf, 0);
return TCL_OK;
}
...
static int pager_stats(...)
{
Pager *pPager;
int i, *a;
if( argc!=2 ){
...
return TCL_ERROR;
}
//if( Tcl_GetInt(interp, argv[1], (int*)&pPager) ) return TCL_ERROR;
pPager = strtol(argv[1], NULL, 16);
if(!pPager) return TCL_ERROR;
a = sqlitepager_stats(pPager);
...
}
...
The following updates are required
in src/test2.c.
The fixes are similar in nature to the fixes applied for
btree tests in src/test3.c.
...
static int pager_pagecount()
{
...
//if( Tcl_GetInt(interp, argv[1], (int*)&pPager) ) return TCL_ERROR;
pPager = strtol(argv[1], NULL, 16);
if(!pPager) return TCL_ERROR;
...
}
...
After fixing the pager SEGFAULT, we rerun the test and now the SEGFAULT appears in tableapi tests.
...
table-7.1... Ok
table-7.2... Ok
table-7.3... Ok
tableapi-1.0...make: *** [Makefile:232: test] Segmentation fault
Using gdb backtrace
feature and upon inspection of code in src/test1.c, we know
that the SEGFAULT is caused, again, by conversion of pointer
addresses to and from string representation. This time, it is
more remarkable as the pointer corresponding to sqlite
database is represented as a 32 bit integer
using %d format specifier which results in
overflow of the variable and therefore has a negative
value. The string representation of pointer address is
converted back to pointer using the atoi()
function. We fix these issues by using the %p
format specifier and strtol() function to
recover the pointer address from its string representation.
static int sqlite_test_open(...)
{
sqlite *db;
char *zErr = 0;
char zBuf[100];
if( argc!=2 ){
...
return TCL_ERROR;
}
db = sqlite_open(argv[1], 0666, &zErr);
if( db==0 ){
...
return TCL_ERROR;
}
//sprintf(zBuf,"%d",(int)db);
sprintf(zBuf,"%p",db);
Tcl_AppendResult(interp, zBuf, 0);
return TCL_OK;
}
...
static int test_exec_printf(
void *NotUsed,
Tcl_Interp *interp, /* The TCL interpreter that invoked this command */
int argc, /* Number of arguments */
char **argv /* Text of each argument */
){
sqlite *db;
...
//db = (sqlite*)atoi(argv[1]);
db = strtol(argv[1], NULL, 16);
...
}
...
After fixing the SEGFAULT issue with tableapi, we rerun the
test. All the tests pass except the tcl-1.6. We
ignore this test because it the error message format differs
between what it expects and what it got.
...
tcl-1.1... Ok
tcl-1.2... Ok
tcl-1.3... Ok
tcl-1.4... Ok
tcl-1.5... Ok
tcl-1.6...
Expected: [1 {syntax error in expression "x*"}]
Got: [1 {invalid bareword "x" in expression "x*";
should be "$x" or "{x}" or "x(...)" or ...}]
tcl-2.1... Ok
tcl-2.2... Ok
temptable-1.0... Ok
temptable-1.1... Ok
...
All the tests now run successfully. This assures us that the software able to deliver the functionality as expected by the developers.
In the next section, we critique the program code and documentation of sqlite version 2.2.1 based on our experience of compiling and testing it on a modern platform.