Newick format tree loader

By Malcolm McLean Homepage

This is a simple laoder for the Newick file format.

The Newick format is designed for phylogenetic trees, but can be used for any sort of tree. The formal description is here .

Trees are relatively simple to read but there are complications cased by the rules for labels, which can be quoted, unquoted, or even absent. Therefore I have produced a loader. There is no saver, the save routine is relatively easy to write, and in practise you will want to manipulate trees in your own format and simply write them out. There are also no manipulation functions. You will want to load the tree and then traverse it by hand.

The label can contain arbitrary data. So virtually any tree can be saved. I have included a function for translating labels to quotes.

  • newick.h
  • newick.c


    The tree format

    typedef struct newicknode
    {
      int Nchildren;              /* number of children (0 for leaves) */
      char *label;                /* node label, can be null */
      double weight;              /* node weight */
      struct newicknode **child;  /* list of children */ 
    } NEWICKNODE;
    
    typedef struct
    {
      NEWICKNODE *root;         
    } NEWICKTREE;
    

    There are no special leaf nodes. Each node can have a label and a weight associated with it. Any number of children are allowed.


    loadnewicktree

    /*
      load a  newick tree from a file
      Params; fname - path to file
              error - return for error code
      Returns: the loaed object on success, 0 on fail
      Error codes: -1 out of memory
                   -2 parse error
                   -3 can't open file
     */
    NEWICKTREE *loadnewicktree(char *fname, int *error)
    

    This is the call to load a tree


    floadnewicktree

    /*
      load newick tree from an opened stream
      Params: fp - input stream
              error - return for error
      Returns: pointer to object, 0 on fail
      Error codes -1 out of memory
                  -2 parse error
      Notes: format allows several trees to be stored in a file 
     */
    NEWICKTREE *floadnewicktree(FILE *fp, int *error)
    

    Load a tree from an open file. Since trees are terminated by semicolons it is possible to contruct files containing several trees.


    killnewicktree

    /*
      newick tree destructor
     */
    void killnewicktree(NEWICKTREE *tree)
    

    Destroys the tree and frees up memory. Each node is individually allocated, if you want to manipulate the tree yourself.


    makenewicklabel

    /*
      turns a string into a label suitable for use in the tree
      Params: str - the string
      Returns: modified string, 0 on out of memory
      Notes: strings containing spaces have them replaced by underscores
             strings contianing illegal characters are quoted
             null pointer is returned as the empty string ""
     */
    char *makenewicklabel(const char *str)
    
    Ancillary function to turn a string into a label. Used for writing out trees. The return is a string alloacted with malloc. Note that the null pointer will be converted to the empty string and therefore needs freeing.