Download data snapshots to CSV

POND Snapshots API allows to export large number of records programmatically. Snapshots can be scheduled providing a date range and a target data entity and become available for download as CSV shortly after.

Authorization

POND API follows the OAuth2 authorization framework. To generate the access token required for all requests you must be assigned a pair of OAuth client credentials by SGI, they look like:

client_id: 3fake23gasjh12fake55bgk3jfi14fake
client_secret: 329847ywqejhd8293giu2sh23yde9238dghkjhwd8237geu2bgi23897

The credentials must be joined and Base64-encoded to be used for client authentication (base64Encode("<client_id>:<client_secret>")). Here’s an example using the openssl command, available on most terminals.

echo -n '3fake23gasjh12fake55bgk3jfi14fake:329847ywqejhd8293giu2sh23yde9238dghkjhwd8237geu2bgi23897' | openssl base64

the result should look like:

M2Zha2UyM2dhc2poMTJmYWtlNTViZ2szamZpMTRmYWtlOjMyOTg0N3l3cWVqaGQ4MjkzZ2l1MnNoMjN5ZGU5MjM4ZGdoa2pod2Q4MjM3Z2V1MmJnaTIzODk3

The following HTTP POST generates a token for our fake client 3fake23gasjh12fake55bgk3jfi14fake and requests the “pond/insights.snapshots” permission (scope).

curl -X POST \
https://sgipond.auth.us-west-2.amazoncognito.com/oauth2/token \
-H 'Authorization: Basic M2Zha2UyM2dhc2poMTJmYWtlNTViZ2szamZpMTRmY...MjN5ZGU5MjM4ZGdoa2pod2Q4MjM3Z2V1MmJnaTIzODk3' \
-H 'Content-type: application/x-www-form-urlencoded' \
-d 'grant_type=client_credentials&scope=pond/insights.snapshots'

The response contains the access token to be used to call the /insights/snapshots endpoint and looks as follows.

{
    "access_token": "eyJraWQiOiJwRno5eTBxaTlnaGh2VStZbkY5Z...9fC7llrB6yvCUMWngmMkVqkmir-vPMXZJs2orZhvw",
    "expires_in": 3600,
    "token_type": "Bearer"
}

Access tokens are valid for 1 hour.

Schedule a snapshot

We can finally use the token to schedule a snapshot of the event table (default) between the time point 03/13/2019 11:00am PST and 03/28/2019 11:00am PST. To make sure all dates are timezone-aware encode them in ISO8601 (2019-03-13T18:00:00.000Z, 2019-03-28T18:00:00.000Z).

curl -X POST \
https://api.sgipond.com/insights/snapshots \
-H 'Authorization: Bearer eyJraWQiOiJwRno5eTBxaTlnaGh2VStZbkY5Z...9fC7llrB6yvCUMWngmMkVqkmir-vPMXZJs2orZhvw' \
-d 'created_at_from=2019-03-13T18:00:00.000Z&created_at_to=2019-03-28T18:00:00.000Z' \
-H 'Content-type: application/x-www-form-urlencoded'

Please be sure to specify the argument encoding using the Content-type header or the parameters will be ignored.

If the credentials and range parameters are valid the response will confirm that the request has been accepted for processing (HTTP code 202).

{
    "result": "OK",
    "snapshot": {
        "href": "https://api.sgipond.com/insights/snapshots/c5e99fed-e2d4-471f-88f7-123456789",
        "items": [{
            "href": null,
            "size": null,
            "count": null,
            "state": "pending",
            "query": {
                "entity": "event",
                "created_at": ["2019-03-13T18:00:00.000Z", "2019-03-28T18:00:00.000Z"]
            }
        }]
    }
}

This flow is asynchronous because the generated CSV can range from few Kilobytes to several Gigabytes depending on the query parameters.

Downloading the CSV

By issuing HTTP GET calls to the snapshot uri (see snapshot.href) we can inspect the status of the export job and keep polling for updates until complete.

curl \
https://api.sgipond.com/insights/snapshots/c5e99fed-e2d4-471f-88f7-123456789 \
-H 'Authorization: Bearer eyJraWQiOiJwRno5eTBxaTlnaGh2VStZbkY5Z...9fC7llrB6yvCUMWngmMkVqkmir-vPMXZJs2orZhvw'

Once state is marked complete the snapshot item can be downloaded by simply following the snapshot.items[N].href link, a temporary access key is included in the link so simply calling the url with a plain HTTP GET will initiate the download (no access token required here).

{
    "href": "https://api.sgipond.com/insights/snapshots/c5e99fed-e2d4-471f-88f7-123456789",
    "items": [{
        "href": "https://prod-pond-insights-exports.s3.us-west-2.amazonaws.com/c5e99fed-e2d4-471f-88f7-123456789/event_000?AWSAccessKeyId=ASIATCABABABABDAWI.....",
        "size": 2490228,
        "count": 26990,
        "state": "complete",
        "query": {
            "entity": "event",
            "created_at": [
                "2019-03-13T18:00:00.000Z",
                "2019-03-28T18:00:00.000Z"
            ]
        }
    }]
}

To test the download paste the entire item url (items[N].href) in a web browser address bar (https://prod-pond-insights-exports.s3.us-west-2.amazonaws.com/c5e99fed-e2d4-471f-88f7-123456789/event_000?AWSAccessKeyId=ASIATCABABABABDAWI.....).

For security reasons the signed link is valid for 1 hour but you can generate a new signature by repeating this latest call. Snapshots remain available for 7 days after being created.