PDB Loader

By Malcolm Mclean Homepage

This is a source file to load Brookhaven PDB files.

The format is somewhat complicated. Though the code is stable enough for use, it cannot guarantee to load every PDB file out there. In particular attempts to load non-protein files may well fail.

The PDB structure is semi-opaque. It is intended that it only be accessed through functions in the file pdb.c, but in practise it is hard to support every type of operation anyone might want to do. Therefore the elements of the structure are exposed.

If you enhance or bug fix, please update me at regniztar@btinternet.com

Source files


pdb.h exposes a simple interface.

PDB *loadpdb(const char *path, int *err);

Load a pdb file from a path. Return 0 on error. err contains the error code
0 - OK
-1 - can't open file
-2 - out of memory
-3 - error parsing file

PDB *floadpdb(FILE *fp, int *err);
Same as load pdb, but pass in an already opened file. Useful if you are piping output.
void killpdb(PDB *pdb);

Destructor for the pdb object. Call to free the memory used by the pDB structure after you have finished with it.
int pdb_getNmodels(PDB *pdb);

Returns the number of models in the PDB file. Returns 1 if there is only one model, as is typically the case with non-NMR deposits.
int pdb_getNchains(PDB *pdb);

Returns the number of peptide chains in the protein
int pdb_setmodel(PDB *pdb, int model);

Set the model to use, from 0 to N Models - 1. Intially the first model in the file is selected.

Assert-fails if passed an out of range index.

int pdb_getchainlen(PDB *pdb, int chain);
Get the number of residues in a chain. Index zero-based.

Assert fails if passed a bad chain index.

int pdb_chainindex(PDB *pdb, char *id);

PDB files have named chains. Typically but not always these are named "A", "B", "C". This function converts from a name to a zero-based index.

Returns -1 if the named chain is not present.

int pdb_getatom(PDB *pdb, int chain, int residue, char *atom, float *x, float *y, float *z);

Retrieve the atom data. Function returns the elemental number of the atom on success. PDB files have named atoms, "CA", "CB" and so forth.

Assert fails if passed a bad residue or chain index. If the atom is not present it will return 0. Often files have missing atom records.

Currently there isn't any support for retrieving badly-named atoms in the file.

int pdb_getsequence(PDB *pdb, int chain, char *out);

Retrieves the amino acid sequence of the peptide chain. Output is in the canonical one letter code, with modified residues represented by X

Assert fails if passed a bad chain index. Buffer must be large enough to hold chain plus terminating NUL.

int pdb_getsecondary(PDB *pdb, int chain, char *out);

Gets the secondary structure. This function does not call DSSP, it reads the SHEET and HELIX records to produce a three-state output @A', 'B' or '-'. Sometimes the records are impossible to parse, in which case the function will return -1.

Assert fails if passed a bad chain index. Buffer must be big enough to hold chain plus terminating NUL.