Onramp Postgres
- Feature Name:
rfc_0008_onramp_postgres
- Start Date: 2020-01-21
- Tremor Issue: tremor-rs/tremor-runtime#0015
- RFC PR: tremor-rs/tremor-rfcs#0008
Summary
Pull data from Postgres tables.
Motivation
We see use cases a function of features supported by PostgreSQL out of the box, as well as additional features introduced as extensions or otherwise packaged as completely separate products. Primary examples that we are looking at are TimescaleDB and PipelineDB.
The ultimate goal is to be able to reliably pipe in data from PostgreSQL core to Tremor engine.
Guide-level Explanation
PostgreSQL is a standard Tremor onramp. It is configured by passing in standard PostgreSQL connection string arguments, such as host
, port
, user
, password
and dbname
.
In addition to connection string arguments, we require:
interval
in milliseconds that specifies the time we wait before executing the next query.consume_from
specifies the timestamp from which to backfill data.query
in form of standardSELECT
query which retrieves rows from a table or view.
Unlike other onramps implemented so far, PostgreSQL onramp will persist consume_from
and consume_until
, allowing recovery from error conditions or other failures.
Example onramp.yml
:
id: db
type: postgres
codec: json
config:
host: localhost
port: 5432
user: postgres
password: example
query: "SELECT id, name from events WHERE produced_at <= $1 AND produced_at >
$2"
interval_ms: 1000
dbname: sales
cache:
path: "/path/to/cache.json"
size: 4096
Cache is a memory mapped region, either file-based or anonymous.
Reference-level Explanation
Drawbacks
The onramp does not provide an implementation of all types Postgres supports and the entirety of query language (for example, LISTEN/NOTIFY semantics).
Rationale and Alternatives
As a starting point, or first implementation, supporting basic SELECT
statements with a time interval as additional WHERE
clause supports the major use case: ingesting rows from a table or view.
Alternative approach would utilize PostgreSQL single row mode suitable for ingestion of results returned by queries that span a large number of rows. Ensuring at-most-once ingestion of rows would be a potential problem and a time-consuming/API-breaking implementation.
Other approaches, such as trigger-based watches or binary log readers are also a possibility.
Prior Art
Unresolved Questions
Future Possibilities
For users that utilise PostgreSQL as an event store, which seems to be a more
common use case lately, support for LISTEN/NOTIFY
would be essential to become one of modes of this onramp.
Support multiple queries in the same onramp.
More flexible means of specifying parameters.