This file contains functions to parse FASTA format files.
Any enhancements or bug fixes send to firstname.lastname@example.org
A FASTA format file consists of one or more DNA, RNA or protein sequences. It is designed to be relatively simple, basically just consisting of a header with the raw sequence data.
The header is introduced with the greater than character. The convention is to separate fields with vertical bars.
Comments are allowed and are handled by this loader, but are in fact rarely used and may break other software.
The sequence data is in one-letter codes, with lower case translated to upper case. Whitespace is ignored, and lines should be shorter than eighty characters.
The fasta object is intended to be semi-opaque. Most code should access it using the access functions, however the fields of the structure are exported if you need them.
All functions will assert fail if passed bad indices. All indices are zero-based.
FASTA *loadfasta(char *fname, int *err);
The function will return NULL if the file falis to load.
FASTA *floadfasta(FILE *fp, int *err);
void killfasta(FASTA *fa);
int fasta_getNsequences(FASTA fa)
void fasta_getsequence(FASTA *fa, int index, char *out);
Buffer must be large enough to contain sequence as well as termimating NUL.
void fasta_getgappedsequence(FASTA *fa, int index, char *out);
int fasta_getlength(FASTA *fa, int index);
int fasta_getgappedlength(FASTA *fa, int index);
int fasta_gettype(FASTA *fs, int index);
|FASTA_UNKNOWN||can't work out type of data|
|FASTA_PROTEIN||- canonical 20 amino acids|
The extended sequences contain codes for unknown or modified residues, also protein sequences with embedded stop codons.