Software Resurrection of SQLite-2.2.1
(released in Jan. 2002)

by
  1. Introduction
  2. Compile
  3. Test
    1. Issue 1: Tcl library not found
    2. Issue 2: Undefined variable resolved by extern qualifier
    3. Issue 3: What does a failed test case imply?
  4. Critique

3. Test

We have now compiled the sqlite-2002 (i.e. version 2.2.1) version released in Jan. 2022 successfully in a modern platform. The sqlite-2002 software was not designed to run on future platforms; the developers of SQLite could not be foreseen how the hardware and software would change in 20 years. Therefore, although the software has compiled successfully in a modern platform, we do not still know if the software can function in the way that its developers had expected during its released in 2002. How can we ensure that the software is behaving in the way it was desired in 2002?

The sqlite-2002 documentation states that this version of sqlite, "implements a large subset of SQL92." and allows "atomic commit and rollback protect data integrity". We could prepare some SQL query statements based on SQL92 and prepare some SQL tables to test atomic commits. However, will we able to develop trust in this software based on such simple mock tests? What would it take to confirm that the software is truly behaving in the way it was designed to behave before its release in 2002? The developers of sqlite-2002 were best placed to know the desired and expected behaviour of this version of SQLite. We can develop a good level of trust if the software passes all the tests that were prepared by the developers. There is no other simple and quick way to develop such a level of trust in this software.

How do we run the tests included with the sqlite-2002 source distribution? The README file contained in the sqlite-2002 source distribution was sufficient to compile the source code in a Unix-like platform. However, the README file does not contain instructions about running the tests. So, we check if the documentation that gets shipped with the sqlite-2002 source can be compiled using the common make doc command.

$ make doc
...
tclsh ../sqlite/www/index.tcl `cat ../sqlite/VERSION` >index.html
tclsh ../sqlite/www/sqlite.tcl >sqlite.html
...
tclsh ../sqlite/www/faq.tcl >faq.html
mkdir -p doc
mv index.html sqlite.html changes.html lang.html opcode.html arch.html
arch.png vdbe.html c_interface.html crosscompile.html mingw.html
tclsqlite.html download.html speed.html faq.html doc
      

This command generates all the documentation in HTML format which can be viewed in a web browser. The doc/index.html is the entry point of the documentation and a snippet of this file is shown in the screenshot below.

Screenshot of the generated doc/index.html file from sqlite-2.2.1 source distribution.

It is only in the generated document that we see that regression tests can be executed using the make test command which is still fairly a standard way to execute tests. We expect to see a message like "All tests passed". However, if that were the case, we would have less learning opportunity. Let us see what learning opportunities emerge as we engage with the tests included in the sqlite-2002 source distribution.

3.1 Tcl library not found

As advised in the generated documentation, we execute the regression tests using the make test command and encounter our first issue with the test suite.

$ make test
../sqlite/src/test1.c:19:10: fatal error: tcl.h: No such file or directory
   19 | #include "tcl.h"
      |          ^~~~~~~
compilation terminated.
make: *** [Makefile:224: testfixture] Error 1
      

SQLite test suite depends on the Tcl library. At the time of writing of this article, tcl-8.6.12 is the most recent released version of Tcl. We download and compile tcl-8.6.12 with a hope that it will let us run the SQLite test suite.

## Compile tcl-8.6.12 from source
## Note: you can skip this step if your already have Tcl installed
mkdir -p $HOME/sr/sqlite/2.2.1/deps/src
cd $HOME/sr/sqlite/2.2.1/deps/src
wget https://prdownloads.sourceforge.net/tcl/tcl8.6.12-src.tar.gz
tar -zxvf tcl8.6.12-src.tar.gz
cd tcl8.6.12/unix  # not standard, had to look into http://www.tcl.tk/doc/howto/compile.html
./configure --prefix=$HOME/sr/sqlite/2.2.1/deps/ && make -j && make install

## Re-compile sqlite-2.2.1 (with Tcl dependencies)
cd $HOME/sr/sqlite/2.2.1/build
rm *  # clear existing build files
../sqlite/configure
...
checking for library containing Tcl_Init... no
checking TCL header files... not specified: still searching...
checking for tcl.h... no
...
      

The configure script cannot find the tcl-8.6.12 library even after we use the --includedir and --libdir flag. The LDFLAGS and CFLAGS environment variables also do not help. So, we manually edit the generated Makefile and include the location of our version of compiled tcl library.

## Update Makefile to get Tcl library from custom location
sed -i '/TCL_FLAGS = -DNO_TCL=1/c\TCL_FLAGS = -I${HOME}/sr/sqlite/2.2.1/deps/include' Makefile
sed -i '/LIBTCL =  -ldl -lm/c\LIBTCL = ${HOME}/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so -ldl -lm' Makefile
      

We try to recompile and rebuild the test as follows.

$ make -j && make test
...
../sqlite/src/tclsqlite.c:622:36: error: 'Tcl_Interp' has no member named 'result'
  622 |       if( zInfo==0 ) zInfo = interp->result;
      |                                    ^~
make: *** [Makefile:226: testfixture] Error 1
      

How can we fix this issue?

Issue 3.1: missing Tcl dependency
Solution to Issue 3.1

The error message indicates that the Tcl library has introduced breaking changes because of which the result field is not available in the Tcl_Interp data structure. We look into the Tcl_Interp API documentation which describes this breaking change and a possible solution for legacy programs.

For legacy programs and extensions no longer being maintained, compiles against the Tcl 8.6 header files are only possible with the compiler directives
#define USE_INTERP_RESULT
#define USE_INTERP_ERRORLINE
        
depending on which fields of the Tcl_Interp struct are accessed.
Tcl_Interp, Tcl C API

We update the Makefile to include the required preprocessor directive to allow access to the result field.

$ sed -i '/TCL_FLAGS = -I${HOME}\/sr\/sqlite\/2.2.1\/deps\/include/c\TCL_FLAGS = -I${HOME}\/sr\/sqlite\/2.2.1\/deps\/include -DUSE_INTERP_RESULT=1' Makefile
$ cat $HOME/sr/sqlite/2.2.1/build/Makefile
...
TCL_FLAGS = -I${HOME}/sr/sqlite/2.2.1/deps/include -DUSE_INTERP_RESULT=1
...
          

This fix lets the test suite to use the Tcl library. As we will see in the next issue, this fix is not sufficient to compile and run the tests.

3.2 Undefined variable resolved byextern qualifier

We recompile the sqlite-2002 tests after fixing the issue with the Tcl library and encounter our next issue.

$ cd $HOME/sr/sqlite/2.2.1/build && make -j && make test

...
/usr/bin/ld: sqlite/src/pager.h:69: multiple definition of 'pager_refinfo_enable'
collect2: error: ld returned 1 exit status
make: *** [Makefile:224: testfixture] Error 1


$ grep -rnw 'pager_refinfo_enable' $HOME/sr/sqlite/2.2.1/sqlite/src/

src/pager.c:160:  int pager_refinfo_enable = 0;
src/pager.c:163:    if( !pager_refinfo_enable ) return;
src/pager.h:69:int pager_refinfo_enable;
src/test3.c:837:  Tcl_LinkVar(interp, "pager_refinfo_enable", (char*)&pager_refinfo_enable,

      
Issue 3.2: extern variables
Solution to Issue 3.2

The re-declaration of this variable in src/pager.h#69 shows that the developers wanted to say to the compiler that this variable is being used here but it has already been declared and defined somewhere else. A compiler directive for such variables is extern. This issue can be fixed by declaring the concerned variable as extern page_refinfo_enable in src/pager.h#69 which helps the compiler resolve the variable's definition and declaration from src/pager.c#160 This error did not show up in the Build stage because the SQLITE_TEST preprocessor directive did not get defined because of missing Tcl dependency which is essential for the sqlite test.

The issue can be fixed by declaring the variable as extern in src/pager.h#69 which informs the compiler to search for its original declaration and definition in some other source file.

cd $HOME/sr/sqlite/2.2.1/sqlite/src/
sed -i '/int pager_refinfo_enable;/c\extern int pager_refinfo_enable;' pager.h
cat pager.h
...
#ifdef SQLITE_TEST
void sqlitepager_refdump(Pager*);
extern int pager_refinfo_enable;
#endif

          

This fix is contained in a patch and can be applied to the source as follows.

# Assumptions: Issue 2.1, 2.2 are already fixed
cd $HOME/sr/sqlite/2.2.1/sqlite
patch -p1 < $HOME/sr/sqlite/2.2.1/patches/3-extern.patch
          

The patching process is confirmed by the following message.

patching file src/pager.h
          

This fix takes allows the tests to compile and run.

3.3 What does a failed test case imply?

We are now ready to compile and run the tests. While many initial test cases run successfully, the test case btree-1.5 fails with a Segmentation fault (or SEGFAULT) error.

$ cd $HOME/sr/sqlite/2.2.1/build && make -j && make test
...
creating sqlite
./testfixture ../sqlite/test/quick.test
bigrow-1.0... Ok
bigrow-1.1... Ok
bigrow-1.2... Ok
bigrow-1.3... Ok
...
bigrow-5.3... Ok
bigrow-5.4... Ok
btree-1.1... Ok
btree-1.1.1... Ok
btree-1.2... Ok
btree-1.3... Ok
btree-1.4... Ok
btree-1.4.1... Ok
btree-1.5...make: *** [Makefile:232: test] Segmentation fault
      
The debugging process for the Issue 3.3 was arduous and frustrating. If I had used the GNU gdb debugger, it would have been simpler as shown in the software resurrection paper (pdf). Abhishek Dutta (2022-09-15)

SEGFAULT errors are caused by programs trying to access a memory location that is not allowed to access. The program code that is triggering SEGFAULT can be located using a debugger's backtrace functionality which shows a detailed history of the program instructions that led to the SEGFAULT (e.g. the backtrace or (bt) command of the gdb debugger). Inspecting the output of the make test command, we observe that tests are invoked by the following command: testfixture ../sqlite/test/quick.test. The testfixture appears as an executable which takes an argument that corresponds to a file containing all the test case description. Let us find more about this file using the file command.

$ cd $HOME/sr/sqlite/2.2.1/build
$ gdb --args testfixture ../sqlite/test/quick.test
not in executable format: file format not recognised
$ file ./testfixture
./testfixture: POSIX shell script, ASCII text executable, with very long lines
      

Contrary to our expectation, the testfixtute is a shell script and therefore it is not possible to use the gdb debugger to trace the source of the SEGFAULT error. Since it is a plain text file, we open it in a text editor which reveals that this script is "just a wrapper for" another executable program named lt-testfixture. The build folder does not contain the lt-testfixture executable. Where could it be located? The find . -name lt-testfixture reveals that this file is located inside the hidden build/.libs folder. After a second look at the contents of testfixture script, it is clear that this information could have been gleaned from the progdir variable. In the first look, for some reason, this failed to register.

$ cat $HOME/sr/sqlite/2.2.1/build/testfixture
...
program=lt-'testfixture'
progdir="$thisdir/.libs"
...
if test -f "$progdir/$program"; then
    ...
    exec $program ${1+"$@"}
    ...
else
    # The program doesn't exist.
    $echo "$0: error: $progdir/$program does not exist" 1>&2
    $echo "This script is just a wrapper for $program." 1>&2
    echo "See the libtool documentation for more information." 1>&2
    exit 1
fi
      

Now we shift our attention to the .libs/lt-testfixture. Using a combination of file and ldd commands, we know that this is an executable that is dynamically linked to the Tcl and sqlite library. Therefore, this must be the main executable that is responsible for running all the test cases defined in test/quick.test.

$ file .libs/lt-testfixture
.libs/lt-testfixture: ELF 64-bit LSB pie executable, x86-64
...

$ ldd ./libs/lt-testfixture
...
/home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
libsqlite.so.0 => /home/tlm/sr/sqlite/2.2.1/build/.libs/libsqlite.so.0
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
...
      

There are two threads of investigation at this point. First is to understand the functionality of lt-testfixture by studying its source code. Second is to understand the test cases specifications in test/quick.test.

What source code generates the lt-testfixture executable file? The source folder does not contain any file named lt-testfixture.*. The Makefile also does not provide any clues to how this executable gets generated. For now, we give up this line of inquiry as it does not seem to leading anywhere.

$ cd $HOME/sr/sqlite/2.2.1/build/
$ find . -iname '*lt-testfixture*'
{output is empty}
$ find . -iname '*testfixture*'
{output is empty}
$ grep -rnw 'lt-testfixture' .
{output is empty}
$ grep -rnw 'testfixture' .
{output is empty}
      

We pursue the second line of inquiry by exploring the test case specifications defined in test/quick.test file. The contents of this file appears to be a program code but the programming language does not look familiar. We use a internet search engine with the following query "set rename proc set foreach" and conclude that this code is written in the Tcl programming language. Since, I was not familiar with the Tcl language, it was not initially clear that test/quick.test is written in the Tcl language. In spite of unfamiliarity, the remarkable property of the intelligible keywords chosen by the Tcl language ensured that I could understand the computations defined in this file: collect all test specification defined in test/*.test and execute them using the Tcl interpreter. Since, our test execution failed at btree-1.5, we continue our investigation by looking into the test/btree.test.

File: test/btree.test#47

# 2001 September 15
# ...
# This file implements regression tests for SQLite library.  The
# focus of this script is btree database backend
...
...
...
# Do an insert and verify that the database file grows in size.
#
do_test btree-1.4 {
  set rc [catch {btree_begin_transaction $::b1} msg]
  lappend rc $msg
} {0 {}}
do_test btree-1.4.1 {
  lindex [btree_pager_stats $::b1] 1
} {1}
do_test btree-1.5 {
  set rc [catch {btree_cursor $::b1 2 1} ::c1]
  if {$rc} {lappend rc $::c1}
  set rc
} {0}
...
      

Let us pause our investigation for a moment to enjoy the intelligibility of this program code. The comment at the beginning of the source file identifies the purpose of this program code in a clear and concise way. It also conveys the date when this program code was created which helps us appreciate that this was written more than 20 years ago. I am not familiar with the Tcl programming language yet I can understand many aspects of this test code. First, I see a set of do_test instructions which suggests that this program is executing a set of tests. Second, we notice a set of sequential (e.g. btree-1.1, btree-1.2, btree-1.3, btree-1.4) and hierarchical (e.g. btree-1.4.1) which implies, without any requiring any additional qualifications, the following:

Returning to our investigation, we realise that we still do not know the following two details: (a) what computations are carried out by when do_test gets invoked? and (b) what computations are contained in these individual tests like btree-1.5? First, let us find more details about the do_test by listing all the occurrences of this keyword in the source folder.

$ cd $HOME/sr/sqlite/2.2.1/sqlite/ && grep -rnw 'do_test' .
...
./test/tester.tcl:62:# Invoke the do_test procedure to run a single test
./test/tester.tcl:64:proc do_test {name cmd expected} {
...
      

If we patiently scroll through all the results, we will see that the do_test is a procedure (or function) defined in test/tester.tcl. Because of the judicious choice of programming language keyword (e.g. proc) and identifier names (e.g. name, cmd, expected), we now know that do_test is a procedure (or function) which executes the command contained in cmd argument and compares the result of running this command with the expected outcome. If the result does not match the expectation, the do_test procedure prints an informative error message.

File: test/tester.tcl#62

...
# Invoke the do_test procedure to run a single test
#
proc do_test {name cmd expected} {
  global argv nErr nTest skip_test
  ...
  if {[catch {uplevel #0 "$cmd;\n"} result]} {
    puts "\nError: $result"
    incr nErr
    lappend ::failList $name
    if {$nErr>10} {puts "*** Giving up..."; finalize_testing}
  } elseif {[string compare $result $expected]} {
    puts "\nExpected: \[$expected\]\n     Got: \[$result\]"
    incr nErr
    lappend ::failList $name
    if {$nErr>10} {puts "*** Giving up..."; finalize_testing}
  } else {
    puts " Ok"
  }
}
...
      

We now move to the second question of finding out the computations carried out in individual tests like btree-1.5. We now realise that the usage of do_test in test/btree.test not only invoked the tests but also named the individual tests and defined the commands that constitutes a test. So, the btree-1.5 involves executing the following instructions in the Tcl language,

do_test btree-1.5 {
  set rc [catch {btree_cursor $::b1 2 1} ::c1]
  if {$rc} {lappend rc $::c1}
  set rc
} {0}
      

where the expected output from this test is 0. Due to unfamiliarity with the Tcl language, it is difficult to understand the instructions involved in this test. So, we start with the following questions and refer to the Tcl documentation to understand the purpose of these keywords and find the answer to the following two questions.

  1. What is btree_cursor?
  2. What is the purpose of $::b1 variable?

To answer the first question, we use the grep command to learn that btree_cursor is declared as a Tcl command and defined as a C-language function src/test3.c

$ cd $HOME/sr/sqlite/2.2.1/sqlite/ && grep -rnw
'btree_cursor' .
./src/test3.c:502:** Usage:   btree_cursor ID TABLENUM WRITEABLE
./src/test3.c:506:static int btree_cursor(
./src/test3.c:540:** Close a cursor opened using btree_cursor.
./src/test3.c:827:  Tcl_CreateCommand(interp, "btree_cursor", btree_cursor, 0, 0);
...
          

File: sqlite-2.2.1/src/test3.c

/*
** Usage:   btree_cursor ID TABLENUM WRITEABLE
**
** Create a new cursor.  Return the ID for the cursor.
*/
static int btree_cursor(
  void *NotUsed,
  Tcl_Interp *interp,    /* The TCL interpreter that invoked this command */
  int argc,              /* Number of arguments */
  char **argv            /* Text of each argument */
){
  Btree *pBt;
  int iTable;
  BtCursor *pCur;
  int rc;
  int wrFlag;
  char zBuf[30];

  if( argc!=4 ){
    ...
    return TCL_ERROR;
  }
  if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
  if( Tcl_GetInt(interp, argv[2], &iTable) ) return TCL_ERROR;
  if( Tcl_GetBoolean(interp, argv[3], &wrFlag) ) return TCL_ERROR;
  rc = sqliteBtreeCursor(pBt, iTable, wrFlag, &pCur);
  if( rc ){
    Tcl_AppendResult(interp, errorName(rc), 0);
    return TCL_ERROR;
  }
  sprintf(zBuf,"0x%x", (int)pCur);
  Tcl_AppendResult(interp, zBuf, 0);
  return SQLITE_OK;
}
          

To answer the second question (i.e. the purpose of $::b1), we use our prior knowledge which informs that the :: operator that allows access to global variable in the C programming language. It is highly likely that Tcl also has same interpretation of this keyword. We use a search engine to find out the way to print value of a variable. From the Tcl documentation, we learn that the puts command can be used to print values of these variables. Therefore, we update the test/btree.test file as follows to print the value of these $::b1 and ::c1variables.

File: test/btree.test

...
do_test btree-1.5 {
  puts "DEBUG: Showing values of btree-1.5 variables"
  puts "Value of b1={$::b1}"
  set rc [catch {btree_cursor $::b1 2 1} ::c1]
  puts "Value of rc={rc}"
  if {$rc} {lappend rc $::c1}
  set rc
} {0}
...
      

Now when we re-run the make test command to run the regression test, we get the following surprising result.

$ cd $HOME/sr/sqlite/2.2.1/build && make test
...
btree-1.5...DEBUG: Showing values of btree-1.5 variables
Value of b1={0x90908db0}
make: *** [Makefile:232: test] Segmentation fault
          

There are two important things to note here. First is that b1 is indeed a global variable and that it is storing some hexadecimal value as a string. Second, we do not see the output of the form Value of rc=... which implies that the SEGFAULT is being caused by the btree_cursor command. What is the purpose of b1 global variable? The test/btree.test shows that the b1 global variable is set by the btree-1.1 test case which involves opening a btree.

File: test/btree.test#24

...
do_test btree-1.1 {
  file delete -force test1.bt
  file delete -force test1.bt-journal
  set rc [catch {btree_open test1.bt} ::b1]
} {0}
...
      

Let us see what is returned by the btree_open method defined in src/test3.c.

File: src/test3.c#51

/*
** Usage:   btree_open FILENAME
**
** Open a new database
*/
static int btree_open(
  void *NotUsed,
  Tcl_Interp *interp,    /* The TCL interpreter that invoked this command */
  int argc,              /* Number of arguments */
  char **argv            /* Text of each argument */
){
  Btree *pBt;
  int rc;
  char zBuf[100];
  if( argc!=2 ){
    ...
    return TCL_ERROR;
  }
  rc = sqliteBtreeOpen(argv[1], 0666, 10, &pBt);
  if( rc!=SQLITE_OK ){
    Tcl_AppendResult(interp, errorName(rc), 0);
    return TCL_ERROR;
  }
  sprintf(zBuf,"0x%x",(int)pBt);
  Tcl_AppendResult(interp, zBuf, 0);
  return TCL_OK;
}
      

This reveals that the btree_open command creates a new database and returns the pointer address of the database as a string. The string representation of the pointer address is created using the %x format specifier of the sprintf function. We refer to the manual page of sprintf and find out the description of %x specifier.

$ man sprintf

PRINTF(3)               Linux Programmer's Manual

NAME
       printf, fprintf, dprintf, sprintf, snprintf, vprintf, vfprintf,
       vdprintf, vsprintf, vsnprintf - formatted output conversion
...

Conversion specifiers

       A character that specifies the type of conversion to be
       applied.  The conversion specifiers and their meanings are:

       ...

       x, X   The unsigned int argument is converted to unsigned
              hexadecimal (x and X) notation. ...

       ...

       p      The void * pointer argument is printed in hexadecimal
              (as if by %#x or %#lx).
      

This is quite revealing. The pointer address is being cast to 4 byte integer using (int)pBt because the %x specifier expects an unsigned int argument. Doesn't a pointer address in 64 bit systems require 8 bytes of storage? It is possible that the sprintf() based code is only storing the lower 32 bits of the pointer address and discarding the higher 32 bits of the memory address thereby resulting in segmentation fault. Let us verify this hypothesis by using the %p format specifier which is specifically meant for converting a pointer address to string representation. We update the src/test3.c to return the string representation of pointer address using both %x and %p specifiers.

File: src/test3.c#56

static int btree_open(
  ...
  //sprintf(zBuf,"0x%x",(int)pBt);
  sprintf(zBuf,"\nOld=0x%x \nNew=%p",(int)pBt, pBt);
  ...
}
      

Let us build the test code again and see if our hypothesis is correct.

$ cd $HOME/sr/sqlite/2.2.1/build && make test
...
btree-1.5...DEBUG: Showing values of btree-1.5 variables
Value of b1={
Old=    0x85b31db0
New=0x558c85b31db0}
...
      

Should we should jump like a scientist who has suddenly had a Revelation? Amongst all the noisy output created by our debug statements, we can see what we had been anticipating. Our hypothesis was correct and we have found the reason for the segmentation fault. The %x format specifier in sprintf() function was indeed dropping the higher 32 bits of the pointer address and only passing the lower 32 bits of the pointer address corresponding to the newly created btree database. All the subsequent tests which tried to access the btree database using this pointer address caused the segmentation fault error. We now work to fix the code such that all the tests run successfully.

We observe that there are two occurrences of 0x%x in src/test3.c. We replace them with %p as follows.

$ cat $HOME/sr/sqlite/2.2.1/sqlite/src/test3.c
static int btree_open(...)
{
  ...
  //sprintf(zBuf,"0x%x",(int)pBt);
  sprintf(zBuf,"%p",pBt);
  ...
}

static int btree_cursor(...)
{
  ...
  //sprintf(zBuf,"0x%x", (int)pCur);
  sprintf(zBuf,"%p", pCur);
  ...
}
      

We compile the test again.

$ cd $HOME/sr/sqlite/2.2.1/build && make test
...
bigrow-5.3... Ok
bigrow-5.4... Ok
btree-1.1... Ok
btree-1.1.1...
Error: integer value too large to represent as non-long integer
btree-1.2... Ok
btree-1.3...
Expected: [0 {}]
     Got: [1 {integer value too large to represent as non-long integer}]
...
      

The error message is produced by the do_test procedure defined in test/tester.tcl. This implies that the test btree-1.1.1 is executing some instruction that is causing overflow of integer conversion operation. From test/btree.test, we note that btree-1.1.1 test case triggers the btree_pager_stats.

File: test/btree.test

...
do_test btree-1.1.1 {
  lindex [btree_pager_stats $::b1] 1
} {0}
...
      

What are the instructions being executed by the btree_pager_stats function? Let us at this function's definition in the src/test3.c file.

File: src/test3.c

...
static int btree_pager_stats(
  void *NotUsed,
  Tcl_Interp *interp,    /* The TCL interpreter that invoked this command */
  int argc,              /* Number of arguments */
  char **argv            /* Text of each argument */
){
  Btree *pBt;
  int i;
  int *a;

  if( argc!=2 ){
    ...
    return TCL_ERROR;
  }
  if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
  a = sqlitepager_stats(sqliteBtreePager(pBt));
  for(i=0; i<9; i++){
    static char *zName[] = {
      "ref", "page", "max", "size", "state", "err",
      "hit", "miss", "ovfl",
    };
    char zBuf[100];
    Tcl_AppendElement(interp, zName[i]);
    sprintf(zBuf,"%d",a[i]);
    Tcl_AppendElement(interp, zBuf);
  }
  return TCL_OK;
}
...
      

The Tcl_GetInt() function is trying to convert the 64 bit pointer address passed as a string argument to 32 bit integer representation and therefore causing the overflow error. We can fix this using the standard library's strtol() function.

File: src/test3.c

...
static int btree_pager_stats(...)
{
  ...
  //if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
  pBt = strtol(argv[1], NULL, 16);
  if(!pBt) return TCL_ERROR;
  a = sqlitepager_stats(sqliteBtreePager(pBt));
  ...
}
...
      

We run the tests again.

$ cd $HOME/sr/sqlite/2.2.1/build && make test
...
bigrow-5.3... Ok
bigrow-5.4... Ok
btree-1.1... Ok
btree-1.1.1... Ok
btree-1.2... Ok
btree-1.3...
Expected: [0 {}]
     Got: [1 {integer value too large to represent as non-long integer}]
btree-1.4...
Expected: [0 {}]
     Got: [1 {integer value too large to represent as non-long integer}]
...
      

The btree-1.1.1 test has passed, so had btree-1.2. This shows that we are on the right track. Let us fix the issue with btree-1.3 in the same way as we fixed the btree-1.1.1 test. For this, we have to fix a similar integer overflow error in btree_close() function defined in src/test3.c.

File: src/test3.c

...
static int btree_close(...)
{
  ...
  //if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
  pBt = strtol(argv[1], NULL, 16);
  if(!pBt) return TCL_ERROR;
  rc = sqliteBtreeClose(pBt);
  ...
}
...
      

We run the tests again and note that btree-1.3 passes while the test fails with btree-1.4. We now know that this issue needs to be fixed in the following places.

We fix the pBt and pCur issues in the functions shown above using the strtol() function as shown below.

File: src/test3.c

...
static int btree_insert()
{
  ...
  //if( Tcl_GetInt(interp, argv[1], (int*)&pCur) ) return TCL_ERROR;
  pCur = strtol(argv[1], NULL, 16);
  if(!pCur) return TCL_ERROR;
  rc = sqliteBtreeInsert(pCur, ...)
  ...
}
...
      

We run the tests again and encounter another segmentation fault error again.

$ cd $HOME/sr/sqlite/2.2.1/build && make
...
btree-8.19... Ok
btree-8.20... Ok
btree-8.21... Ok
btree-8.22... Ok
btree-8.23... Ok
btree-8.24... Ok
btree-9.1... Ok
btree-9.2...make: *** [Makefile:232: test] Segmentation fault
      

Let us inspect btree-9.2 test and find out the cause of segmentation fault. The btree-9.2 test case invokes the following functions: btree_insert(), select_keys(). We have already fixed the btree_insert(). The select_keys() procedure invokes the following functions: btree_move_to(), btree_key(), btree_next(). We have already fixed these two functions. So what is causing the segmentation fault? The btree_insert is causing segmentation fault. For the btree-9.1 test, why does the btree_insert() function get invoked with garbage data (e.g. "This is a long prefix ...")? This results in the loop only iterating through 001 to 018. The 019 iteration gets to run in btree-9.2 test. Why such a mix up is happening? It seems that the test case of 8.14 is spilling over to 9.1 test case.

It appears that the overflow starts in btree-8.5. In the test case btree-8.5, long sequence of characters (e.g. "*** This is an even longer key *** This is an even ....") is inserted against the key "020". This insertion does not seem to be successful as it invokes the btree_insert() function two times. There appears to be a lag in the value printed inside the src/test3.c::btree_insert() C function and the messages printed inside the test/btree.test::do_test invocations.

It is important to do these in steps. It may be tempting to use grep and sed tools to automatically replace each occurrences. This does not help as it also replaces other parts which are not essential for the test and therefore may result in unintended results. After fixing btree-2.2, you will notice that more tests start to pass with just one fix as some common functions are used in most tests.The mix up of error message seems to have been caused by a missing fflush(stdout).

I suddenly realised that I can use gdb for debugging using

cd $HOME/sr/sqlite/2.2.1/build
cat Makefile > Makefile-release # create a backup
sed -i 's/-g -O2/-ggdb -O0/' Makefile 
make clean && make && make test
gdb .libs/lt-testfixture
(gdb) r ../sqlite/test/quick.test
...
btree-8.22... Ok
btree-8.23... Ok
btree-8.24... Ok
btree-9.1... Ok
btree-9.2...
Program received signal SIGSEGV, Segmentation fault.
0x0000555555557711 in cellSize (pCell=0x5555ffffbbc8) at ../sqlite/src/btree.c:334
334	  int n = NKEY(pCell->h) + NDATA(pCell->h);
(gdb) bt
#0  0x0000555555557711 in cellSize (pCell=0x5555ffffbbc8) at ../sqlite/src/btree.c:334
#1  0x000055555555acc0 in balance (pBt=0x555555639a70, pPage=0x55555563a740, pCur=0x5555556399f0) at ../sqlite/src/btree.c:1958
#2  0x000055555555b7ec in sqliteBtreeInsert (pCur=0x5555556399f0, pKey=0x5555555dd730, nKey=3, pData=0x555555601f30, nData=35) at ../sqlite/src/btree.c:2144
#3  0x00005555555628cb in btree_insert (NotUsed=0x0, interp=0x5555555870c0, argc=4, argv=0x55555558b9e0) at ../sqlite/src/test3.c:680
#4  0x00007ffff7e50695 in TclInvokeStringCommand () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#5  0x00007ffff7e553f2 in TclNRRunCallbacks () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#6  0x00007ffff7e570d3 in TclEvalEx () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#7  0x00007ffff7f0b9c9 in Tcl_FSEvalFileEx () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#8  0x00007ffff7f0bb48 in Tcl_EvalFile () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#9  0x0000555555565563 in main (argc=2, argv=0x7fffffffe008) at ../sqlite/src/tclsqlite.c:620

      

The gdb log shows that the SEGFAULT is triggered by cellSize() which in turn is triggered by balance(). We notice that before the cellSize() method is invoked in balance() method, a copy of pages is created using the copyPage() function which is defined as follows.

File: src/btree.c

...
static void copyPage(MemPage *pTo, MemPage *pFrom){
  uptr from, to;
  ...
  to = Addr(pTo);
  from = Addr(pFrom);
  for(i=0; i<pTo->nCell; i++){
    uptr x = Addr(pFrom->apCell[i]);
    ...
  }
}
...
      

The Addr() method is being used to get the pointer address of source and destination pages. This raises a suspicion that the Addr() method may only be using the lower 32 bits of the pointer address. We look into the src/sqliteInt.h file where this method is defined.

File: src/sqliteInt.h

...
/*
** Integers of known sizes.  These typedefs might change for architectures
** where the sizes very.  Preprocessor macros are available so that the
** types can be conveniently redefined at compile-type.  Like this:
**
**         cc '-DUINTPTR_TYPE=long long int' ...
*/
#ifndef UINT32_TYPE
# define UINT32_TYPE unsigned int
#endif
#ifndef UINT16_TYPE
# define UINT16_TYPE unsigned short int
#endif
#ifndef UINT8_TYPE
# define UINT8_TYPE unsigned char
#endif
#ifndef INTPTR_TYPE
# define INTPTR_TYPE int
#endif
typedef UINT32_TYPE u32;           /* 4-byte unsigned integer */
typedef UINT16_TYPE u16;           /* 2-byte unsigned integer */
typedef UINT8_TYPE u8;             /* 1-byte unsigned integer */
typedef INTPTR_TYPE ptr;           /* Big enough to hold a pointer */
typedef unsigned INTPTR_TYPE uptr; /* Big enough to hold a pointer */

/*
** This macro casts a pointer to an integer.  Useful for doing
** pointer arithmetic.
*/
#define Addr(X)  ((uptr)X)
...
      

This provides a lot of relief as we know that we are near to the possible fix of this issue. The Addr() is a macro that casts the value to an uptr type which in turn is of unsigned int type. Therefore, we now know the cause of the SEGFAULT issue. The 64 bit pointer addresses corresponding to different pages were wrongly converted to 32 bit unsigned intstorage type. We correct this issue as follows.

File: src/sqliteInt.h

...
#ifndef INTPTR_TYPE
//# define INTPTR_TYPE int
# define INTPTR_TYPE long
#endif
...
      

After applying this fix, we recompile the tests.

$ cd $HOME/sr/sqlite/2.2.1/build && make
...
btree-12.11... Ok
btree-12.12... Ok
btree-13.1...
Error: integer value too large to represent as non-long integer
btree-15.98... Ok
btree-15.99... Ok
btree-99.1... Ok
lt-testfixture: integer value too large to represent as non-long integer
...
make: *** [Makefile:232: test] Error 1
      

We apply similar fixes.

File: src/test3.c

...
static int btree_sanity_check(...)
{
  ...
  //if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
  pBt = strtol(argv[1], NULL, 16);
  if(!pBt) return TCL_ERROR;
  ...
  for(i=0; i<argc-2; i++){
    //if( Tcl_GetInt(interp, argv[i+2], &aRoot[i]) ) return TCL_ERROR;
    aRoot[i] = strtol(argv[i+2], NULL, 16);
    if(!aRoot[i]) return TCL_ERROR;
  }
  ...
}
...
static int btree_clear_table(...)
{
  ...
  //if( Tcl_GetInt(interp, argv[1], (int*)&pBt) ) return TCL_ERROR;
  pBt = strtol(argv[1], NULL, 16);
  if(!pBt) return TCL_ERROR;
}
...
      

All the btree tests now pass. Furthermore, other subsequent tests (e.g. btree3, copy, delete, expr, func, in, index, insert, ...) also pass except the pager tests.

...
misc1-2.1... Ok
misc1-2.2... Ok
pager-1.0... Ok
pager-1.1...make: *** [Makefile:232: test] Segmentation fault
      

We are now pretty confident that this issue is also related to pointer address conversion. Let us load the tests again in gdb and use the backtrace command to find the source of SEGFAULT.

cd $HOME/sr/sqlite/2.2.1/build
make clean && make && make test
gdb .libs/lt-testfixture
(gdb) r ../sqlite/test/quick.test
...
misc1-1.10... Ok
misc1-1.11... Ok
misc1-2.1... Ok
misc1-2.2... Ok
pager-1.0... Ok
pager-1.1...
Program received signal SIGSEGV, Segmentation fault.
0x000055555555f907 in sqlitepager_stats (pPager=0x7fff55624440) at ../sqlite/src/pager.c:1096
1096	  a[0] = pPager->nRef;
(gdb) bt
#0  0x000055555555f907 in sqlitepager_stats (pPager=0x7fff55624440) at ../sqlite/src/pager.c:1096
#1  0x0000555555560a17 in pager_stats (NotUsed=0x0, interp=0x5555555870c0, argc=2, 
    argv=0x55555558b9c0) at ../sqlite/src/test2.c:181
#2  0x00007ffff7e50695 in TclInvokeStringCommand ()
   from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#3  0x00007ffff7e553f2 in TclNRRunCallbacks () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#4  0x00007ffff7e570d3 in TclEvalEx () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#5  0x00007ffff7f0b9c9 in Tcl_FSEvalFileEx () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#6  0x00007ffff7f0bb48 in Tcl_EvalFile () from /home/tlm/sr/sqlite/2.2.1/deps/lib/libtcl8.6.so
#7  0x00005555555655a4 in main (argc=2, argv=0x7fffffffdff8) at ../sqlite/src/tclsqlite.c:620
(gdb)

      

This leads us to the src/test2.c source which also includes the usage of Tcl_GetInt() to convert string representation of pointer address to integer. We fix this also in the same way as before.

File: src/test2.c

...
static int pager_open(...)
{
  Pager *pPager;
  int nPage;
  ...
  //sprintf(zBuf,"0x%x",(int)pPager);
  sprintf(zBuf,"%p",pPager);
  Tcl_AppendResult(interp, zBuf, 0);
  return TCL_OK;
}
...
static int pager_stats(...)
{
  Pager *pPager;
  int i, *a;
  if( argc!=2 ){
    ...
    return TCL_ERROR;
  }
  //if( Tcl_GetInt(interp, argv[1], (int*)&pPager) ) return TCL_ERROR;
  pPager = strtol(argv[1], NULL, 16);
  if(!pPager) return TCL_ERROR;
  a = sqlitepager_stats(pPager);
  ...
}
...
      

The following updates are required in src/test2.c.

The fixes are similar in nature to the fixes applied for btree tests in src/test3.c.

File: src/test2.c

...
static int pager_pagecount()
{
  ...
  //if( Tcl_GetInt(interp, argv[1], (int*)&pPager) ) return TCL_ERROR;
  pPager = strtol(argv[1], NULL, 16);
  if(!pPager) return TCL_ERROR;
  ...
}
...
      

After fixing the pager SEGFAULT, we rerun the test and now the SEGFAULT appears in tableapi tests.

...
table-7.1... Ok
table-7.2... Ok
table-7.3... Ok
tableapi-1.0...make: *** [Makefile:232: test] Segmentation fault
      

Using gdb backtrace feature and upon inspection of code in src/test1.c, we know that the SEGFAULT is caused, again, by conversion of pointer addresses to and from string representation. This time, it is more remarkable as the pointer corresponding to sqlite database is represented as a 32 bit integer using %d format specifier which results in overflow of the variable and therefore has a negative value. The string representation of pointer address is converted back to pointer using the atoi() function. We fix these issues by using the %p format specifier and strtol() function to recover the pointer address from its string representation.

File: src/test1.c

static int sqlite_test_open(...)
{
  sqlite *db;
  char *zErr = 0;
  char zBuf[100];
  if( argc!=2 ){
    ...
    return TCL_ERROR;
  }
  db = sqlite_open(argv[1], 0666, &zErr);
  if( db==0 ){
    ...
    return TCL_ERROR;
  }
  //sprintf(zBuf,"%d",(int)db);
  sprintf(zBuf,"%p",db);
  Tcl_AppendResult(interp, zBuf, 0);
  return TCL_OK;
}
...
static int test_exec_printf(
  void *NotUsed,
  Tcl_Interp *interp,    /* The TCL interpreter that invoked this command */
  int argc,              /* Number of arguments */
  char **argv            /* Text of each argument */
){
  sqlite *db;
  ...
  //db = (sqlite*)atoi(argv[1]);
  db = strtol(argv[1], NULL, 16);
  ...
}
...
      

After fixing the SEGFAULT issue with tableapi, we rerun the test. All the tests pass except the tcl-1.6. We ignore this test because it the error message format differs between what it expects and what it got.

...
tcl-1.1... Ok
tcl-1.2... Ok
tcl-1.3... Ok
tcl-1.4... Ok
tcl-1.5... Ok
tcl-1.6...
Expected: [1 {syntax error in expression "x*"}]
     Got: [1 {invalid bareword "x" in expression "x*";
           should be "$x" or "{x}" or "x(...)" or ...}]
           tcl-2.1... Ok
tcl-2.2... Ok
temptable-1.0... Ok
temptable-1.1... Ok
...

All the tests now run successfully. This assures us that the software able to deliver the functionality as expected by the developers.

In the next section, we critique the program code and documentation of sqlite version 2.2.1 based on our experience of compiling and testing it on a modern platform.