IEN 141
     INDRA Note 897
     11th April 1980










                      Message System Issues



                          C. J. Bennett





          ABSTRACT:  This  INDRA  Note  discusses   the
          design  choices for the message server system
          to  be  built  at  UCL.   Particular   issues
          considered include: the nature of the UK user
          community; the nature of the message  service
          to  be  offered  on  the  server; the message
          formats and transfer protocols  to  be  used;
          addressing;  interworking  with  the  ARPANET
          community; and  the  design  of  the  message
          management system on the message server.


























                        Table of Contents




  1. Introduction...........................................1


  2. The User Community.....................................1


  3. Message Movement.......................................2

     3.1 Message Format.....................................2
        3.1.1 Message Format Staging........................3
     3.2 Message Protocol...................................3
     3.3 Message Transport..................................4
        3.3.1 FTP Staging...................................5
     3.4 Addressing.........................................5
     3.5 Status Reporting...................................8

  4. Message Server Design..................................8

     4.1 User Interface.....................................8
     4.2 Message Management.................................10

  5. Conclusions............................................12


























     1. Introduction

       Electronic message services  have  historically  been
     one  of  the most successful services to have developed
     from the use  of  packet  switched  computer  networks.
     However,  these  facilities  have not been available to
     users of United Kingdom research data networks  in  the
     past,  and  UK  users who wished to send mail to remote
     sites were  required  to  obtain  mailboxes  on  remote
     machines  in the United States, accessible via ARPANET.
     With the development of public networks, in  particular
     IPSS  and  PSS,  and  in  view  of the UKPO's policy of
     requiring users to move to these  networks,  it  is  no
     longer  economically  feasible to continue this mode of
     usage.

       For these reasons  it  is  proposed  that  University
     College  London  will  develop  a message server system
     based  on  a  PDP-11/35  running  UNIX  and  accessible
     initially to users through the DARPA Catenet, and later
     through PSS. This server would allow users to  exchange
     messages  with  other  users on the same site, users of
     ARPANET mail systems, and eventually users of other  UK
     and  US message servers.  The aim of this INDRA note is
     to identify the design constraints on this  system  and
     to suggest approaches that may be taken to meet them.


     2. The User Community

       Five major groups of users can be identified who  can
     be  expected  to  interact  with  such a service in the
     short term.  These are:

      (i)    Current  users  of  the  ARPANET  mail  system,
             especially  UK  users who have (until recently)
             had dialin access through the TIP. The  message
             server  would  become the prime mail server for
             this group. US users of ARPANET systems must be
             able to send messages to this site.  This group
             will require messages  formatted  according  to
             the  rules specified in RFC 733 (as modified by
             actual practice).

      (ii)   Users of the DARPA Catenet, who will  be  using
             at  least  three  formats  for  intersite mail:
             those of RFC 733; those of  the  Internet  Mail
             Protocol  as defined in IEN 85; and the private
             formats being developed by RSRE.

      (iii)  Users who wish to exchange messages between the
             UCL  server  and other servers which may become
             available  through   PSS.   This   group   will
             initially require only PSS access to the server

Bennett                                                  [Page 1]


INDRA Note 897, IEN 141                     Message System Issues




             and will exchanges messages locally, but in the
             longer  term  it  can be anticipated that other
             mail servers will emerge on PSS.

      (iv)   Users who wish to  exchange  messages  with  US
             message  servers  available through Telenet and
             IPSS. In particular,  such  traffic  may  arise
             through the US EDUNET project.

      (v)    UCL users who will  exchange  messages  through
             the  UCL  ring,  and  who will wish to exchange
             messages with users in one or more of the other
             three categories.



     3. Message Movement

       This section is concerned with  the  questions  which
     affect  the  movement  of  messages between the message
     server and other message sites.  Four  major  questions
     must be considered: choice of message format; choice of
     transport mechanism; mail protocol; and addressing.


     3.1 Message Format

       The message  format  may  be  based  on  one  of  the
     following choices:

      (i)    ARPANET Format (RFC 733)

      (ii)   Internet Mail Format

      (iii)  RSRE Mail Format

      (iv)   Other format not currently in use  amongst  the
             user  community,  such  as those that may arise
             through the work  of  IFIP  TC6.5,  or  through
             Telenet and EDUNET.


       Of these choices,  only  the  first  is  feasible  at
     present.  It  is  that which is most widely used at the
     moment,  as  it  provides  the  current  ARPANET   mail
     service, and the internal UCL Unix mail service, and it
     is intended that it shall be  used  for  initial  DARPA
     Catenet  mail.  The  DARPA Internet Mail format is very
     experimental, and although it  is  expected  to  remain
     stable for the time being no experience has been gained
     with it. Much the same  comment  applies  to  the  RSRE

Bennett                                                  [Page 2]


INDRA Note 897, IEN 141                     Message System Issues




     system. The fourth choice involves either obtaining  an
     existing commercial system such as COMET, or devising a
     new format from scratch. Both these possibilities would
     result  in  considerable  delay,  and a UCL home-brewed
     format would be unlikely to be any  more  satisfactory,
     and  would  be  much less acceptable to the users, than
     other alternatives.

       As it may be anticipated that the server will have to
     interwork  eventually  with other formats, notably that
     of RSRE and whatever emerges amongst the EDUNET  group,
     the  development  of  other  formats  should be closely
     tracked. It is expected that conversion will eventually
     take  place through the use of a common Internet Format
     such as that being  developed  in  the  DARPA  Internet
     scheme.


     3.1.1 Message Format Staging

       One result of this is that users who will  eventually
     require  a  different format for messages for their own
     server - initially, RSRE in particular - will require a
     conversion  between  the  two. It is expected that this
     will take place at the UCL message  server.   As  noted
     above,  it  is  to  be hoped that conversions will take
     place through a common intermediary format.

       An important longterm question in this regard is  how
     widely   the   UCL   message   server  system  will  be
     distributed in the UK. If  other  message  servers  are
     built along the same lines, then the format chosen will
     become a __ _____ UK standard, at least  among  the  UK
     research community.


     3.2 Message Protocol

       The current ARPANET message protocol is essentially a
     trivial   extension   to  the  ARPANET  file  transfer,
     obtained through the  MAIL  option.  This  causes  each
     message to be sent as a separate file to be appended to
     the message file of an individual user  at  that  site.
     Given  future use of IPSS and PSS this is an uneconomic
     option. There are two reasons for this.

      (i)    Demultiplexing for a message  which  is  to  be
             copied to several users at the same site occurs
             at  the  sender,  not  the  receiver.   Thus  a
             message  for N users at site X is transferred N
             times, even though it is identical. If  mailers

Bennett                                                  [Page 3]


INDRA Note 897, IEN 141                     Message System Issues




             were capable of  parsing  the  message  headers
             properly, the message need only be sent once.

      (ii)   For each message transferred  a  separate  data
             connection  is  set  up.  Thus  a  queue  of  N
             messages for M sites (M < N) will require N + M
             calls   to   be  made.  If  the  messages  were
             mailbagged by site, only 2M calls need be made.
             (Note  that  if FTP control and data were mixed
             on the same call, as in the NIFTP (see  below),
             these figures reduce to N and M respectively).


       Both  these  changes  have  some  impact  on  message
     format.  The  first  requires,  as  a minimum, that all
     recipients of a message at a given site be  visible  in
     the To: and Cc: fields - that is, it is not possible if
     the mailing list facility is used in its current  form.
     In  such  cases,  the sender must provide the list, and
     the receiver must recognise that this  list  should  be
     suppressed  or  separated from the users' copies. It is
     to be hoped that the Internet group  will  accept  this
     proposal  as a minimum change to be made for use in the
     Catenet, and that similar procedures will be set up  by
     other groups.

       Mailbagging requires that  different  messages  in  a
     file   transferred  must  be  clearly  delimited.  This
     requires a mailbag structure to be  defined  -  at  the
     very  least,  by defining a standard message separator.
     However,  it  does   not   require   restructuring   of
     individual  messages.  This  is  a  much more important
     change than the first, and as the saving is  likely  to
     be  less,  it is proposed here that it should await the
     results of experiments with the Internet Mail Protocol.


     3.3 Message Transport

       There are two  major  choices  to  be  made  for  the
     message  transport service, namely the TCP FTP, derived
     from the ARPANET FTP, and the NI FTP.  It  is  expected
     that  the  first  will  be  used  for  mail  within the
     Catenet, using the same MAIL option as used within  the
     ARPANET. As has been seen above, however, this protocol
     is unsuited to our needs because it is  uneconomic.  It
     may   be   retained   initially,  as  it  gives  direct
     compatibility with other Catenet sites.




Bennett                                                  [Page 4]


INDRA Note 897, IEN 141                     Message System Issues




       In the slightly longer term, the NI FTP is  the  more
     attractive   option.  The  reasons  for  this  are  its
     independence of specific  transport  services  and  the
     fact  that  it  will  be  widely adopted in the UK. UCL
     already has implementations on its research Unix and at
     ISIE  (though  these will have to be changed to reflect
     the final specification); an implementation at RSRE  is
     planned;  and future mail servers in the UK will prefer
     to use it. The fact that many of these will  run  above
     X25  networks  while  Catenet  sites  will  use  TCP is
     immaterial; the  necessary  transport-level  conversion
     will  be  handled  by  the  UCL Protocol Convertor. The
     existing ARPANET FTP is demonstrably NCP-specific,  and
     the  TCP  version  of  this  will  at  the  minimum  be
     Catenet-specific in its use of Telnet.


     3.3.1 FTP Staging

       An important consequence of this is that FTP  staging
     will be required, for three reasons.

      (i)    It will be necessary to stage messages into and
             out  of the ARPANET. This applies regardless of
             the FTP used, as ARPANET mail is restricted  to
             use of the ARPANET FTP.

      (ii)   It will be necessary to stage messages  between
             mailers  in  the  Catenet using the TCP FTP and
             those using the NI FTP. If UCL does  decide  to
             use  the  TCP  FTP,  this  decision  is  merely
             postponed until a UK community emerges based on
             the NI FTP.

      (iii)  It  may  eventually  be  necessary   to   stage
             messages   between   UCL   and   Telenet/Tymnet
             servers, even if they adopt a common format, if
             a different transport mechanism is used.

     It is proposed here that experiments with the first two
     stagings  be performed at ISIE, or some other TOPS20 on
     the ARPANET which has all three systems. In  its  final
     form,  the  staging  system  would  consist of a daemon
     which would process the mail file at a special  account
     and  forward  messages  to  the appropriate sites.  The
     structure of such a system is shown in Figure 1.


     3.4 Addressing

       Only four message  sites  in  the  UK  are  initially

Bennett                                                  [Page 5]


INDRA Note 897, IEN 141                     Message System Issues










































                 Figure 1: Staging Daemon System



     expected  to  be  heavily  involved  in   the   system.
     Initially,  development  will  be  in  the  UCL message
     server itself (UCL-MUnix), while at a later  stage  the
     UCL  teaching and research machines (UCL-TUnix and UCL-
     RUnix), and at least one machine at  RSRE  will  become
     involved.  While  other message servers may emerge at a
     later date, it is not expected that  this  will  happen
     rapidly.  Staging  to Catenet and ARPANET sites will be
     through ISIE; the problem of staging to  Telenet/Tymnet

Bennett                                                  [Page 6]


INDRA Note 897, IEN 141                     Message System Issues




     sites must be considered if and when it arises.

       The UK sites should be able to exchange mail directly
     through  the  use  of addresses of the form 'user@site'
     (e.g.  Ruth@UCL-TUnix).   This  format  could  be  used
     throughout  the  mailing  address  space,  although  it
     involves the message sites not  under  UCL  control  to
     make  special  modifications  to their mailers. Thus an
     ARPANET  mailer  presented  with   a   return   address
     'Ruth@UCL-TUnix'  would  have  to  recognise  that this
     should be sent to ISIE; the ISIE mailer would  have  to
     recognise  that  the message should be added to the UCL
     daemon's mailbox and the UCL daemon would then  forward
     the message to UCL-TUnix.

       Two  other  alternatives  are  source   routing   and
     hierarchical  addressing.  A  source routed form of the
     address might be identical in appearance to the ARPANET
     (by  making  'UCL' a synonym for ISIE, in much the same
     way  the  'UDel-EE'  is  a  synonym  for  'Rand-Unix'),
     although for parsing purposes it would be preferable to
     rearrange  it:  (Ruth-(TUnix@(UCL))).  Local   messages
     would  then  appear  as: Ruth-TUnix. An ARPANET address
     would appear to a message server user in  a  form  such
     as: Kirstein-ISI@ISIE. Staging message servers would be
     required to parse the address into intermediate  forms.
     Further,  the  terminal  staging server for the catenet
     and  for  ARPANET  would  be   required   to   suppress
     intermediate  fields. Thus the UCL daemon at ISIE would
     have to transform all addresses of the form:  Kirstein-
     ISI@ISIE to Kirstein@ISI and back again for traffic in
     the reverse direction. Source routing is  the  favoured
     solution of the University of Delaware's MMDF group.

       Hierarchical  addressing  is  actually  the  official
     ARPANET  standard  as described in RFC 733, although it
     is not implemented. It is also the solution favoured in
     Postel's  Internet  system. Under this scheme UCL would
     refer  to  a  widely-known   addressing   domain,   and
     addresses  would  take  the form: Kirstein-ISI@ARPA and
     Ruth-TUnix@UCL. In practice, since only  two  hops  and
     only  one  staging point are involved the two forms are
     virtually synonymous - which is  a  good  argument  for
     postponing  a  real decision until we see an addressing
     hierarchy actually emerging! The  differences  will  be
     seen  when an RSRE server becomes active. In this case,
     an ARPANET site has the choice of the following forms:

          Bryan@NSide               (global)
          Bryan-NSide@PPSN          (hierarchical)
          Bryan-NSide-MUnix@ISIE    (source routing)

Bennett                                                  [Page 7]


INDRA Note 897, IEN 141                     Message System Issues




       Note that in any form changes of the type  above  are
     required   to   ARPANET   mailers.   With   global  and
     hierarchical  addressing,  ARPANET   tables   must   be
     modified  to recognise mail servers (global address) or
     mail address spaces (hierarchical  address).   This  is
     not  required  with  source routing.  The mailer at the
     staging site must additionally recognise  that  account
     names  taking  a certain format should automatically be
     accepted and routed to the  UCL  mail  daemon  at  that
     site. Both solutions therefore require some structuring
     of the address. In the examples above, a  hyphen  ('-')
     has  been  used as a component separator. In fact, this
     is probably a bad choice. Two possibilities are:

      (i)    Use of some other separator, such as %.

      (ii)   Use of the comment fields allowed by  the  mail
             protocol.

     The second choice has the convenient side  effect  that
     the  account  checking procedure need not be changed at
     the staging site, as  addresses  may  then  look  like:
     UCLfor   a   source-routed
     format). However not all message preparation facilities
     will include comment fields (e.g. 'answer' under MSG).

       Since this note was first drafted  my  attention  has
     been  drawn  to  RFC754  (Out-of-Net Host Addresses for
     Mail  by  J.  Postel).   This   note   considers   four
     solutions:  three  are variants on the global solution,
     and the fourth involves name structuring. Postel's note
     favours  a structured name solution. This is compatible
     with  either  a   source   routed   or   hierarchically
     structured solution.


     3.5 Status Reporting

       Finally in this section there is the issue of  status
     reporting.   Currently,  most ARPA-type message systems
     give an  immediate  report,  with  possibly  a  mailer-
     generated  message if there is some subsequent failure.
     For staged or mailbagged messages an  immediate  report
     of  success  can only imply success at the first stage.
     Thus it is important that staging daemons which  cannot
     successfully  deliver  a  message  must  be prepared to
     generate messages indicating why failure occurred. This
     can  be  done  simply  through  the  use of the current
     message generation mechanism.



Bennett                                                  [Page 8]


INDRA Note 897, IEN 141                     Message System Issues




     4. Message Server Design

     4.1 User Interface

       The primary service  which  must  be  provided  is  a
     reliable,  efficient  and  cheap  method of sending and
     processing text messages  exchanged  amongst  the  user
     community.  It  is not intended to provide a multimedia
     service, although this is an important research goal of
     the  program.  Within  this  constraint,  a user of the
     message server must be able to:

      (i)    Prepare messages.

      (ii)   Send messages to remote users.

      (iii)  Receive messages from remote users.

      (iv)   Read messages.

      (v)    Be assured that messages are safely stored  and
             are recoverable in the event of system failure.

      (vi)   Be able to obtain adequate online help  on  the
             use of the server.

     In addition it is desirable that the user be able to:

      (i)    Prepare message files which  may  not  be  sent
             immediately.

      (ii)   Archive and dearchive messages.

      (iii)  Manipulate messages in file structures  of  his
             own creation.

      (iv)   Answer and forward messages.

      (v)    Obtain hardcopy listings.

      (vi)   Maintain mailing lists.

      (vii)  Annotate messages.


       This list is clearly not exhaustive, and the aims  of
     the user interface should be continually reevaluated in
     the light of user experience,  development  experience,
     and  the  recommendations of other message groups, such
     as IFIP TC6.5.  Nor does it imply any evaluation of the
     difficulty  of implementation: answering and forwarding

Bennett                                                  [Page 9]


INDRA Note 897, IEN 141                     Message System Issues




     messages  should  be  comparatively  trivial;  while  a
     satisfactory remote hardcopy listing service is a major
     problem.

       Following the general approach taken in this note, it
     is  proposed that MSG be used at least initially as the
     basis of the user interface in the message server.  The
     user  would enter MSG automatically as his login shell.
     It is expected that the repertoire of commands will  be
     changed and extended in order to provide the full range
     of services listed above (e.g. for the  maintenance  of
     mailing  lists).  This  may  require  the single-letter
     command interface to be modified. It is  also  expected
     that  the  character-at-a-time interface and the use of
     TV editors would have to be altered to fit the needs of
     users  accessing  the  system  via XXX terminals, which
     favour line-oriented commands and editors. These issues
     will be reexamined in the light of experience gained.


     4.2 Message Management

       An important issue is  the  internal  design  of  the
     message  server. The current system of personal mailbox
     files each containing a copy of all messages is complex
     and wasteful in a Unix system solely devoted to message
     handling. It is proposed here that database  structures
     be examined in which only one copy of a message is kept
     in a central directory, and  that  the  user's  current
     mail  file,  and any other mail files he keeps, consist
     solely of descriptors pointing to the  message  and  to
     other   cross-referencing   descriptors  which  may  be
     needed. The structure of the system is shown in  Figure
     2.

       The details  of  the  descriptor  structure  are  not
     considered in this note. However, a number of important
     issues arise. The fundamental question is:  should  all
     messages be kept in a single file, or each message in a
     separate  file?  The  answer   chosen   has   important
     implications  for the limits on the size of the system,
     the method of updating the system, methods of accessing
     messages, and many other issues.

       In the second method, messages may be  found  rapidly
     by  filename,  and  garbage  collection is considerably
     simplified through the  use  of  Unix  file  management
     facilities,  but  on  average  256  bytes  (half a disc
     block) will be wasted per message. Further, at most  an
     entire  file  system  of 64K blocks can be allocated to
     message  service,  although  this  is  not  a   serious

Bennett                                                 [Page 10]


INDRA Note 897, IEN 141                     Message System Issues










































             Figure 2: Message Management Structure



     restriction. Assuming that most messages will be small,
     of  the  order  of 2K characters, the file system would
     allow something less than 16K messages, wasting some 4K
     bytes  of  space. Thus a more serious limitation is the
     number of inodes (file descriptors)  allocated  to  the
     system,  which  is  currently  about 2^13 - allowing 8K
     files. Increasing this to 2^14  is  not  difficult  and
     will allow 16K files, of which a significant proportion
     would be for user descriptor information.

Bennett                                                 [Page 11]


INDRA Note 897, IEN 141                     Message System Issues




       The first method allows more efficient use  of  space
     and  places  a much looser restriction on the number of
     messages that may be retained,  but  requires  building
     searching and garbage collection facilities parallel to
     Unix's. In order  to  use  these,  moreover,  either  a
     complex  file  structure  must  be defined, or a master
     descriptor file retained.

       Pending further investigation, the second  choice  is
     favoured  at this stage. The fact that only one copy of
     a message need be kept  should  help  to  minimise  the
     effects  of  the  restrictions.  Ensuring this may be a
     problem, especially if multiple copies of a message are
     received.  Hence  an important aspect of the system may
     be to examine incoming messages and attempt  to  detect
     duplicates of existing messages.


     5. Conclusions

       The message system discussed here is  centred  around
     text  messages  based largely on ARPANET-style formats,
     at least  initially.  Nevertheless  there  are  several
     important  issues  which  must  be resolved in order to
     bring up a workable system. These issues include:

      (i)    Economic use of transfer and storage resources.

      (ii)   The structure  of  UCL-style  mail  daemons  at
             staging site(s).

      (iii)  The  modification  of  other  mail  servers  to
             handle UCL mail.

      (iv)   Basic addressing style.

      (v)    Detailed user interface.

      (vi)   Message management issues.

     This note has indicated some lines of approach to these
     problems.  They  will  be  examined  in  more detail in
     future notes, prior to the commencement of actual  work
     on  the  system  later  this  year.  It  is  clear that
     satisfactory   progress   requires   cooperation    and
     discussion   with  other  parties,  notably  the  DARPA
     Catenet group and groups using various  public  carrier
     services.  While  the  projects  of the former are more
     advanced at this point, it is expected that the  latter
     groups  will  become increasingly important in the long
     term.

Bennett                                                 [Page 12]