Put Yourself Out There

Reports from the front on Astrophysics, Statistics, and Data Science

Postgres Table Design


Today I worked with Dan Halperin and Brandon Holt on Postgres table design for our KBMOD project. We are using the PostGIS package to provide support for spatial overlap queries (on the celestial sphere instead of the terrestrial one). The big challenge is how to optimally store (and to query) information on 1e6 images that contain a total of 1e12 pixels. The ultimate goal is to intersect these data with proposed orbits of Solar System objects to understand if there is evidence for them in the time series of images.

We worked towards comparing two table designs. The first on allows us to intersect an orbit trajectory with a given image, which acts as a sort of coarse database index, allowing us to only take a detailed look at the pixels within the images that intersect the orbit:

-- IMAGES table:
CREATE TABLE fields ( 
    fieldId BIGINT PRIMARY KEY,
    run INTEGER,
    camcol SMALLINT,
    field INTEGER,
    filter VARCHAR(1),
    bbox GEOMETRY(POLYGON,3786),
    tmid TIMESTAMP WITH TIME ZONE,
    trange TSTZRANGE
  );


-- PIXELS table:
CREATE TABLE pixels ( 
    pixelId BIGSERIAL PRIMARY KEY,
    fieldId BIGINT REFERENCES fields(fieldId),
    ra DOUBLE PRECISION,
    decl DOUBLE PRECISION,
    fval REAL,
    radec GEOMETRY(POINT,3786),
    mask INTEGER
  );


-- EXAMPLE QUERY:
SELECT p.pixelId, p.ra, p.decl, p.fval, ST_DISTANCE(traj, p.radec) AS dist FROM
  ST_SetSRID(ST_MakePoint(-42.8471955, 0.7336945),3786) as traj,
  pixels as p,
  fields as f
WHERE
  TIMESTAMP WITH TIME ZONE '2006-10-21 03:11:44.69136z' <@ f.trange
AND
  ST_INTERSECTS(traj, f.bbox)
AND
 f.fieldId = p.fieldId
ORDER BY dist
LIMIT 10;

A second way us to use a real database index to tell us which pixels to look at. I.e. intersect each trajectory with the pixels table, as below. We think that first intersecting with the fields table, and then intersecting with the pixels in that field, should provide optimal performance. If the database index is up to the task...

-- PIXELS table:
CREATE TABLE pixels ( 
    pixelId BIGSERIAL PRIMARY KEY,
    fieldId BIGINT REFERENCES fields(fieldId),
    ll_r DOUBLE PRECISION,
    ll_d DOUBLE PRECISION,
    lr_r DOUBLE PRECISION,
    lr_d DOUBLE PRECISION,
    ur_r DOUBLE PRECISION,
    ur_d DOUBLE PRECISION,
    ul_r DOUBLE PRECISION,
    ul_d DOUBLE PRECISION,
    bbox GEOMETRY(POLYGON,3786),
    flux REAL,
    mask INTEGER
  );

-- EXAMPLE QUERY:
SELECT p.pixelId, ST_AsText(ST_Centroid(p.bbox)), p.flux FROM
  ST_SetSRID(ST_MakePoint(-42.8471955, 0.7336945),3786) as traj,
  pixels as p,
  fields as f
WHERE
  TIMESTAMP WITH TIME ZONE '2006-10-28 02:55:13.932192z' <@ f.trange
AND
  ST_INTERSECTS(traj, f.bbox)
AND
  ST_INTERSECTS(traj, p.bbox)
AND
  f.fieldId = p.fieldId;

-- EXAMPLE QUERY THAT USES A SUBQUERY TO FIRST FILTER ON FIELD
SELECT 
  ST_SetSRID(ST_MakePoint(-42.8471955, 0.7336945),3786) as traj,
  (fieldId FROM fields as f
    WHERE
      TIMESTAMP WITH TIME ZONE '2006-10-28 02:55:13.932192z' <@ f.trange
    AND
      ST_INTERSECTS(traj, f.bbox)
  ) as fieldId, 
  p.pixelId, ST_AsText(ST_Centroid(p.bbox)), p.flux
FROM 
  pixels as p
WHERE
  p.fieldId = fieldId
AND
  ST_INTERSECTS(traj, p.bbox);

Note that we use a subquery in the second example, which should force the database to restrict the scan over pixels to those that land in the fields that overlap the trajcetory.

My goal is to generate a limited batch of data for both types of pixel tables (100 images of 1e6 pixels), so we can compare query performance.

Comments