The TechFell protocol


Introduction

The TechFell protocol is a very simple lightweight protocol for making connections to SQL databases. It has primarily been aimed at SQLite (www.sqlite.org), allthouth nothing much excludes it's application with other database servers. It's primary advantages are:




How it works

The principle is very simple, the client sends a string with a query, and the server with data, or an error message, or just a simple 'OK' for a query that does not produce a result set.

The (optional) HUMAN mode sends reply data as readable text:

select * from tagvals where Idx < 4
====( 1 )====
Idx = 1
Val = 0
Err = -1
Updated = NULL
WriteVal = NULL
WriteErr = NULL
====( 2 )====
Idx = 2
Val = 0
Err = -1
Updated = NULL
WriteVal = NULL
WriteErr = NULL
====( 3 )====
Idx = 3
Val = 0
Err = -1
Updated = NULL
WriteVal = NULL
WriteErr = NULL
:OK

The more conventional (and default) mode is MACHINE mode. In this mode it is assumed that the data is to be intercepted by code, so it is made a bit more machine readable and compact:

select * from tagvals where Idx < 4
:H1:3 Idx
:H2:3 Val
:H3:3 Err
:H4:7 Update
:H5:8 WriteVal
:H6:8 WriteErr
:R
1
0
-1
!
!
!
2
0
-1
!
!
!
3
0
-1
!
!
!
:OK

As can be seen, in this mode the column headers are sent at the start of  the result which is followed by the fields. Of course the query is not limited to data lookups, you may carry out other SQL operations or get database schema:

update tagvals set Val=4 where Idx=2
:OK

select A,B from tagvals
:Err : SQL error : no such column: A
:OK

pragma table_info(tst)
:H1:3 cid
:H2:4 name
:H3:4 type
:H4:7 notnull
:H5:10 dflt_value
:H6:2 pk
:R
0
foo
numeric
0
!
0
1
baa
numeric
0
!
0
:OK

You have probably grasped the idea by now. All that is needed is a few rules.


Access control

SQLite is designed as an embedded database and does not implement permissions (GRANT and REVOKE), or any other kind of  security, allthougth hooks are present which could allow an eventual implementation. Standard SQL access control tends to be quite complex and fine grained, and whilst being essential for many traditional SQL database applications it is unnecessarily complicated for the type of applications that the TechFell protocol is designed for.

On the other hand, some level of protection is essential if a database is to be used on open networks or the internet, so the TechFell protocol implements a simple protection scheme that mimics access control on filesystems. It is supplied as a complimentary technique to normal SQL security, not as a substitute. Each connection to the server is assigned an access level, or uselevel, which determines the level of access which is allowed. It is made up of a bitfield with the following weights:

       Select                    1
       Update                2
       Insert                  4
       Delete                  8
       Everything else   16

       So a user with level 1 can only do selects.
       A user with level 7 can Select, Update & Insert
       A user with level 12 can only Insert and Delete
       A user with level 31 is a power user



It is up to the server implementation to decide how userlevels are defined, but they would typically be dependent on both the User name and the network address from which they are connecting.


:PPRAGMA commands

:PPRAGMA commands are pragma commands intended for the database wrapper rather than the database proper ( PrePragma). The wrapper handles the command and generates a :PPRAGMA reply. These commands are specific to the wrapper, they control access and change the configuration.

NOTE: Case is important in :PPRAGMA commands and the server need not respect any line editing commands (backspace etc.) Frustrating, but :PPRAGMA's are little used and the TechFell protocol is designed to make servers as easy as possible, not just the clients :-)

:PPRAGMA USER and :PPRAGMA PASS / :PPRAGMA NEWPASS

Logs in a user:
 
:PPRAGMA USER Level5
:PPRAGMA USER Level5
:OK

:PPRAGMA PASS foo

:PPRAGMA USELEVEL 0
:OK

:PPRAGMA PASS
:PPRAGMA USELEVEL 4
:OK

:PPRAGMA NEWPASS newp
:PPRAGMA NEWPASS Level5
:OK

:PPRAGMA PASS newp
:PPRAGMA USELEVEL 4
:OK

:PPRAGMA NEWPASS newp2 newp
:PPRAGMA NEWPASS Level5
:OK

:PPRAGMA PASS newp2
:PPRAGMA USELEVEL 4
:OK

 
In this example the user logs in as 'Level5' (an arbitrary name), and the server confirms. The user then gives the password 'foo'. The server replies with the access level obtained. The user may carry on issuing new user and pass commands until a desired level is achieved, but a secure server would of course close the socket after a few failed attempts, or slow down the response. In this case the user has re-attempted with a blank password. In order to give a blank password it is necessary to leave a single <spc> after the word PASS. After entering with a blank password the user uses NEWPASS to set a password to 'newp', and then logs in again to check that newp works. The users then changes the password again to 'newp2'. As the password is already set it is necessary to give the old password as the second argument.

:PPRAGMA ETX

 The ETX character (ASCII 3) is normally used to terminate 'lines', this means that text fields may contain text formatting control characters such as CR, LF and TAB as they have no significance to the protocol. However, as some environments do not allow the insertion of any control characters in a data stream, the server will always start in 'newline' mode, that is it will accept any combination of ETX, CR and LF as a terminator, and will issue CR as a terminator on the lines it sends back. The :PPRAGMA ETX command requests the server to switch to ETX. This would normally be done straight after connecting. Once a client has received a line terminating with an ETX rather than a CR or LF, it will switch to ETX mode itself and will only use ETX both on reception and transmission. ETX is the normal mode but cannot be set as default, so applications should issue the :PPRAGMA ETX command as sson as they connect to the database.

:PPRAGMA HUMAN and :PPRAGMA MACHINE

The role of these commands is explained in the 'How ir works' section. The server will always start in MACHINE MODE by default, you may switch back and forth between HUMAN and MACHINE  modes as often as you like. Note that whilst MACHINE mode is canonical, HUMAN mode is simply defined as being 'easy to read', the format may vary.

:PPRAGMA ID

:PPRAGMA ID returns a unique ID number for the connection. This should be unique for the entire server lifetime, not just the client. It is useful for creating exclusive names in common namespaces.


Recordsets and replies

If the query does not require any data sets to be returned it will simply reply ':OK' to acknowledge the request.

If data is to be returned then the general format is as follows:    (see also the examples in the 'How it works' section)

    Header section (column header information)
    ':R' seperator
    Field data (sequential field contents sent in column/row order)

Header records are sent in the following format:

    :H5:8 WriteVal

the first value after the :H is the column number. The second value is the size of the column name (that is the text that follows it, not the column width).


The list of column headers is terminated with an ':R'. Following the seperator the field values are sent in column x row order, generally with no prefix. There is an entry for each field, NULL fields are sent as an exclamation '!'.

The exception to this format occurs when the field value starts with a colon ':' or an exclamation '!' or when the field contents are more than 32 characters. In these cases a ':Fn ' prefix is used:

    :F42 !This non null field starts with an ! mark
    :F1 !
    :F3 :::

The value following the F indicates the number of characters in the data field. A single <spc> character seperates the prefix from the start of the data field.


Error Reporting

Errors are reported in line as they occur by means of :Err lines:

select Fullnames,Idx from tagvals
:Err : SQL error : no such column: Fullnames
:OK

:PPRAGMA BLOWUP
:Err : PPRAGMA : Unknown command
:OK

The ':Err :' prefix is followed by a text string describing the error which may have any format. It is important to note that an :Err line may occur anywhere between the query and the final :OK

In particular, errors may occur in the field data section, for example:

..............
............
........
:H5:10 dflt_value
:H6:2 pk
:R
0
foo
numeric
:Err :Memory allocation error
:OK

Needless to say, in the case of grave errors or anomalies the server will simply close the connection.


Data Representation

SQLite treats all data as ASCII strings (more recent version also have Unicode or wide string support, the TechFell protocol does not support this). Column types in SQLite are used to indicate what type of data the column holds, but it does not enforce this. The TechFell protocol embraces this approach and also treats all field values as strings, it does not communicate the field type, but this information is available by means of pragma commands to get table schema. This is OK for embedded applications as the apps should know what they are dealing with. For the TechFell protocol the following conventions are recommended:

Numbers

  1. Integers should be represented 'as-is' in decimal. The 0x prefix is recommended for hex numbers but decimal forms are preffered.
  2. Real values should always include a decimal point and 0, even if there is no fraction.
  3. Real values may be expressed in exponential form, 3.0E8 (3E8 is not acceptable)
  4. Booleans should be expressed as 0 or 1 where 0 is false.
  5. Decimal point is always a '.' locales that use a ',' should be converted prior to insertion.
  6. Plus sign '+' is optional. Minus '-' prefixes numbers and mantissas and is optionally placed immediately after the E for exponents (3.0E-8)

Text

  1. Text may contain text formatting control characters such as CR, LF, TAB, HT & VT and BS. ESC characters are allowed.
  2. CR and LF may not be used in any column that is to be read by a non ETX device (these are rare, in the case of HUMAN mode on text terminals, CR and LF are OK as they just show up 'as is')
  3. Text should avoid using other ctrl characters, in particular NULL, ETX, DC1 (XON) ,DC2 (XOFF),STX,EOT, ETB and SOH as these conflict with TechFell control characters or underlying transports.
  4. Extended ASCII character sets (128->255) may be used.

Dates and Times

  1. TIME(8) = YYYYMMDD
  2. TIME(14) = YYYYMMDDHHmmSS
  3. TIME(14+) = YYYYMMDDHHmmSStt[hh][kk]
  4. TIME(6) = HHmmSS
  5. TIME(10) =  HHmmSStthh
  6. TIME(12) = HHmmSStthhkk
  7. TIME(3) = WWw
  8. TIME(5) = YYWWw
tt=tenths of seconds, hh=hundreths, kk=thousandths
WW = weeknumber, w=Day of week where Monday = 1, Sunday = 7.
      
Note that by using these reccomendations it is possible to automagically determine the format by establishing the length. Types of length 6, 8 and 14 are the prefferred type, in    particular TIMESTAMP(14) is a widely used standard.

Binary data

    Binary data should be base64 encoded and be prefixed base64<spc>. A mime type or other meta data may be expressed before the data using the following forms:

base64 Data
base64 mime Data
base64 meta_no_space Data

    Needless to say, meta data that looks like a mime type (jpeg, gif etc) may have spectacular results!


Data Transports (Multi drop RS485 etc.)

The TechFell protocol was primarily developed for TCP/IP networks, but may be used on other data streams. On Asyncronous serial point to point links (RS232, modems etc) it may be used as-is, but it is prefereable to use PPP if available as this adds error detection and control.

Failing that, serial links may use the following packeting convention, which is also appropriate for RS485 multi-drop networks and radio links:

    <SOH><ADD><ADD><STX><TechFell protocol data><EOT><CheckSum><ETB>

An SOH is followed by an address byte composed of 3 ASCII HEX digits which is then repeated. An <STX> marks the start of  TechFell protocol data. In the case of a recordset reply this may contain more than one line seperated by the normal <ETX> delimeter. The end of the blockis signalled by the EOT character. The CheckSum is 4 ASCII HEX digits terminated by an <ETB>, it is the modulus 16 sum of the ASCII values of all the bytes in the <TechFell protocol data>.  Obviously severe limits should be imposed on the size of the recordsets in multi-drop scenarios.

Any node on a multi-drop network may be a client, a server, or both. However one node must serve the role of 'Master'. The master fingers each slave node in turn with the following packet:

    <SOH><ADD><ADD><ETB>

Upon reception, the slave may reply to the master with a similar packet to indicate that it has nothing to do, or emit 1 packet. The packet may contain a query or a reply to any other node on the network, but the recipient may not reply until it is fingered by the master. If a slave recieves a packet with a framing error (for example the 2 <ADD> values don't match or there was no <EOT>, or a CheckSum error, it does nothing.

A single node may perform the role of  master, client slave, server slave, or even implement several clients and servers, but each one must have it's own network address, and the 'master' must finger it's own clients 'over the network'.

Timeouts for slave responses may be configured on the master on a node by node basis. The master may implement non sequential polling in order to give priority to certain nodes or implement other strategies (for example, as clients and servers have different adresses, the master can ensure that servers get fingered following a request emitted from a client ). The master may also 'degrade' slaves who do not reply.

Slave Clients should be configured to expect a reply from a server within 'n' fingers from the master, and not on the basis of timeouts. (hence masters must track the number of requests that a server has received and finger them accordingly). A server that cannot respond to the query immediately may send a reply to the client with a single <ACK> character in place of the <TechFell protocol data> in order to signal to the client that the request is "pending", and on reception the client resets it's 'n' counter. It may do this indefinitly but if it receives a successive request from the same client it must discard the pending request (requests may not be queued up, and a client should make requests on a 1 by 1 basis).


Legal stuff.

This document is copyright Roger Irwin 2005. You may read it, laugh at it, and maybe even deploy it. It is up to the user to establish the suitability of the protocol for any particular purpose. No liability whatsoever is accepted.

You may use this protocol as you wish. I consider it to be in the public domain. The technology and techniques described here have been in use for at least 20 years and represent industry standard practices. AFAIK There is nothing patentable herin, unless somebody has slapped a patent on "using exclamative symbols for representing  lack of data in a field", which would be an interesting patent as it would also cover the expression "someone has pinched the f******* instruction manual from the tractor".

Have fun and be careful, there are some pretty nasty bytes out there.