n-Dimensional Interpolation

General Principles

Many structure validation tasks boil down to checking an n-dimensional set of values (e.g. the chi dihedral angles for a sidechain) against a map of probability values based on what we know from very high-resolution structures. For a given set of (x1, x2, … xn) coordinates, the resulting P-value must be calculated via linear interpolation from the 2^n corners of the n-orthotope surrounding the point in the map.

Providing rotamer and Ramachandran validation in real time for a 1,000-residue simulation requires on the order of 4-5,000 dihedral angle measurements, 1,000 2D interpolations and 1,000 1-4D interpolations (spread over 25 different maps) for every coordinate update. Maintaining a high graphics framereate in the face of 20+ coordinate updates per second therefore requires this pipeline to be very highly optimised. The RegularGridInterpolator described below may be used as-is as a Python class, taking approximately (20 + 0.13*m) \mu s to perform m 3D interpolations on a single map. While this is good enough for many situations, the Python class is not used by ISOLDE’s core Rama and Rotamer classes. Instead the underlying C++ class is used, allowing all the necessary RegularGridInterpolator instances to be stored in an efficient C++ mapping so that the entire validation task is handled in a single Python call.

n-dimensional Linear Interpolation

RegularGridInterpolator

class chimerax.isolde.interpolation.RegularGridInterpolator(dim, axis_lengths, min_vals, max_vals, grid_data)

A C++ implementation of n-dimensional regular grid interpolation, interfaced to Python using ctypes. Designed as an almost-drop-in replacement for the SciPy RegularGridInterpolator, but significantly faster (particularly for small numbers of interpolations). Whereas the SciPy interpolator has a fixed overhead of about 400 microseconds per call, this implementation reduces that to 20 microseconds. In addition, the speed of each interpolation is increased approximately 5-fold.

Achieving this speed does impose a few limitations, however. Whereas the SciPy interpolator allows uneven steps along each axis, for this implementation the gridded data must be evenly-spaced (although it is not necessary for each axis to have the same spacing). The initialisation function is therefore slightly different to the SciPy implementation to reflect this.

While in theory any number of dimensions is supported, in practice memory limitations make it impractical beyond 4-5 dimensions for most situations. For example, to handle interpolation over a 360 degree range with periodic wrapping on a 10 degree grid requires 39 points per axis. For three dimensions the resulting grid contains about 60k points; four dimensions contains about 2.3 million; five dimensions 90 million and six dimensions 3.5 billion (that is, about 25 GB of RAM in double precision).

Example usage is as follows (also see test_interpolator()):

import numpy
dim = 3
axis_lengths = [50, 50, 50]
min_vals = [0, 0, 0]
max_vals = [1, 1, 1]
grid_data = numpy.random.rand(50,50,50)
interp = RegularGridInterpolator(dim, axis_lengths, min_vals, max_vals,
                                 grid_data)
data = numpy.random.rand(10000, 3)

results = interp(data)
# or, if you prefer:
results = interp.interpolate(data)

Interpolation is fastest if the input is a NumPy double array - anything else is converted internally (with an associated performance penalty) prior to calling the C++ function.

__init__(dim, axis_lengths, min_vals, max_vals, grid_data)

Prepare the interpolator for a given n-dimensional grid. Once created, the grid elements cannot be modified.

Args:
  • dim:
    • An integer specifying the number of dimensions

  • axis_lengths:
    • A numpy int array of length dim, specifying the number of points along each axis

  • min_vals:
    • A numpy double array of length dim, specifying the minimum value of each axis

  • max_vals:

    A numpy double array of length dim, specifying the maximum value of each axis

  • grid_data:

    A n-dimensional numpy double array with dimensions matching axis_lengths, containing all the gridded data.

property axis_lengths

Returns a NumPy array of length dim providing the number of steps on each axis. Read only.

property dim

Returns the number of dimensions of the interpolator. Read only.

property grid

Returns a dim-length tuple of NumPy arrays giving the axis values at each grid step.

interpolate(data)

Returns the interpolated values for a set of (x(1), x(2), … x(dim)) points.

Args:
  • data:
    • a (n, dim) 2D NumPy array providing the coordinates at which to calculate the interpolated values. For fastest performance the data type of the array should be numpy.double.

Returns:
  • a 1D NumPy double array

property max

Returns a NumPy array of length dim providing the maximum limit of each axis. Read only.

property min

Returns a NumPy array of length dim providing the minimum limit of each axis. Read only.

property values

Returns the original gridded data values as a dim-dimensional NumPy array