API Reference

Every parameter. Every header. Every format.

One endpoint. Stack as many query params as you need. Public datasets require no authentication.

Endpoint

GET https://quorel.vercel.app/api/{dataset_id}/{name-slug}/{version}/

dataset_id

The numeric ID of the dataset. Found in the dashboard and in the URL of any dataset page.

name-slug

The dataset name, lowercased and hyphenated. Must match exactly — mismatches return 404 even if the ID is correct.

version

active (the pinned version), latest (the most recent version), or v1, v2, v3 etc. for a specific version number.

Alt version shorthand

GET https://quorel.vercel.app/api/{dataset_id}/{name-slug}/{version}/alt/
# equivalent to ?alt=true

Authentication

Public datasets require no authentication. Private datasets require a Bearer token sent in the Authorization header. Find your private key in the dataset settings inside your dashboard.

GET https://quorel.vercel.app/api/42/my-dataset/active/
Authorization: Bearer your_private_key

Rate limits

Public (no auth)

100 requests / minute / IP

Private (Bearer token)

300 requests / minute / IP

Rate limits are per IP. Private auth uses a separate bucket from public requests on the same IP.

Query parameters

All parameters are optional. They are applied in a fixed pipeline order — see the processing pipeline section below for the exact sequence.

Output format

formatdefault: json

Response format. One of: json, jsonl, csv, tsv, xml, parquet. csv, tsv, and parquet automatically flatten nested objects.

prettydefault: true

Pretty-print JSON output with indentation. Set to false for compact single-line output. Has no effect on other formats.

Pagination

limitdefault: all

Maximum number of entities to return. Applied after all filters, sorting, and deduplication.

offsetdefault: 0

Skip the first N entities. Applied after filters and sorting, before limit. Use with limit for cursor-style pagination.

sampledefault: none

Return N randomly sampled entities. Uses cryptographically secure shuffling. Applied after filters, before offset and limit.

countdefault: false

Return only the count of matched entities, not the entities themselves. Response is a JSON object with count, dataset_id, version, and alt fields.

Filtering

filterdefault: none

Exact match on a field. Format: filter=field:value. Case-insensitive. Supports dot-notation for nested fields (filter=author.name:ada). Repeatable — multiple filters are ANDed together. Fans out across arrays.

filter_containsdefault: none

Partial match on a field. Format: filter_contains=field:value. Case-insensitive substring match. Repeatable and ANDed like filter. Fans out across arrays.

keywordsdefault: none

Full-text search across every field at every depth. Comma-separated values are ORed by default. Use keywords_mode=and to require all terms.

keywords_modedefault: or

Controls how multiple keywords are combined. or returns entities matching any keyword. and requires all keywords to be present.

Sorting

sortdefault: none

Sort by a top-level field. Format: sort=field:asc or sort=field:desc. Numeric fields sort numerically. String fields sort lexicographically. Entities where the field is missing or nested are sorted last.

Field selection

keep_fielddefault: none

Return only the specified fields. Comma-separated. Supports dot-notation: keep_field=title,author.name keeps the full author object but strips every field inside it except name.

drop_fielddefault: none

Remove the specified fields from every entity. Comma-separated. Supports dot-notation: drop_field=reviews.content removes content from every object in the reviews array.

Cleaning

dedupdefault: false

Remove duplicate entities. By default, deduplication is based on a SHA-256 hash of all fields except _source. Use dedup_key to dedup on specific fields only.

dedup_keydefault: none

Comma-separated list of fields to use as the deduplication key. Only meaningful when dedup=true. Example: dedup_key=title,author.

denulldefault: false

Recursively strip null values and empty strings at every depth. Empty arrays that result from stripping are also removed.

include_sourcedefault: true

Include the _source field (the origin URL of each entity). Set to false to strip it from the response.

Structure

flattendefault: false

Flatten nested objects into dot-notation keys. Arrays of primitives are JSON-encoded into a single string cell. Arrays of objects are expanded with indexed keys: reviews_0_rating, reviews_1_rating etc. Required automatically for csv, tsv, and parquet.

altdefault: false

Return the alt version of this dataset version if one exists. Equivalent to appending /alt/ to the URL path.

Response headers

Every successful response includes these headers, regardless of format.

X-RateLimit-Limit

The total number of requests allowed per minute for your authentication tier (100 for public, 300 for private).

X-RateLimit-Remaining

The number of requests remaining in the current rate limit window.

X-RateLimit-Reset

Unix timestamp of when the current rate limit window resets.

X-Dataset-ID

The numeric ID of the dataset that was served.

X-Dataset-Version

The resolved version number, prefixed with v. e.g. v12. Useful when you requested active or latest and want to know the exact version.

X-Dataset-Alt

true if the alt version was served, false otherwise.

X-Total-Count

The number of entities in the response after all filters, deduplication, sampling, offset, and limit have been applied.

Retry-After

Only present on 429 responses. Number of seconds to wait before retrying.

Response formats

jsonapplication/json

Default. Returns a JSON array of entity objects. Supports pretty=false for compact output.

Nested objects are preserved as-is unless flatten=true is set.

jsonlapplication/x-ndjson

Newline-delimited JSON. One entity per line. Ideal for streaming, log pipelines, and large datasets.

No pretty-printing. Each line is a self-contained JSON object.

csvtext/csv

Comma-separated values with a header row. Response includes Content-Disposition with a filename.

Automatically flattens nested objects. Arrays of objects expand with indexed keys.

tsvtext/tab-separated-values

Tab-separated values. Same structure as CSV but tab-delimited. Response includes Content-Disposition.

Automatically flattens nested objects. Arrays of objects expand with indexed keys.

xmlapplication/xml

XML document with a <dataset> root element and one <entity> per record. Field names become element tags.

Non-alphanumeric characters in field names are replaced with underscores. Arrays are wrapped in a pluralized parent tag.

parquetapplication/octet-stream

Apache Parquet binary format. Response includes Content-Disposition. All values are coerced to strings.

Automatically flattens nested objects. Best for large datasets consumed by data warehouses or Pandas.

Processing pipeline

Query parameters are always applied in this fixed order, regardless of the order you write them in the URL. Understanding the sequence matters when combining filters, dedup, sample, and pagination.

01Load entities from storage
02Strip _source (if include_source=false)
03denull — recursive null and empty string removal
04keep_field — retain only specified dot-notation paths
05drop_field — remove specified dot-notation paths
06filter — exact match on fields
07filter_contains — partial match on fields
08keywords — full-text search across all values
09sort — sort by a top-level field
10dedup — deduplicate by hash or key
11sample — cryptographically random subset
12offset — skip first N entities
13limit — cap the result set
14flatten — flatten nested objects (auto for csv, tsv, parquet)
15Serialize to requested format

Examples

Top 10 HN stories sorted by score

GET https://quorel.vercel.app/api/1/hacker-news-top/active/
    ?limit=10
    &sort=score:desc

Remote jobs mentioning "React" or "TypeScript", title and salary only

GET https://quorel.vercel.app/api/2/remote-jobs/active/
    ?keywords=React,TypeScript
    &keep_field=title,salary
    &sort=salary:desc
    &denull=true

Count of YC companies in the finance category

GET https://quorel.vercel.app/api/4/yc-companies/active/
    ?filter=category:finance
    &count=true

5 random AI papers, deduplicated by title, as CSV

GET https://quorel.vercel.app/api/7/arxiv-ai-papers/latest/
    ?dedup=true
    &dedup_key=title
    &sample=5
    &format=csv
    &drop_field=_source

Private dataset with Bearer token, compact JSON

GET https://quorel.vercel.app/api/99/my-private-dataset/active/?pretty=false
Authorization: Bearer your_private_key

Error codes

All error responses are plain text with an appropriate HTTP status code. Error bodies are human-readable descriptions of what went wrong.

400Bad Request
  • Invalid path structure.
  • dataset_id is not a positive integer.
  • version is not active, latest, or vN.
  • format is not one of the supported values.
401Unauthorized
  • Private dataset accessed without an Authorization header. Response includes WWW-Authenticate: Bearer.
403Forbidden
  • Authorization header present but the Bearer token does not match the dataset's private key.
404Not Found
  • No dataset with the given ID exists.
  • The name slug does not match the dataset's name.
  • The requested version does not exist.
  • Dataset has no active version and active was requested.
  • alt=true was requested but no alt version exists for this version.
405Method Not Allowed
  • Any method other than GET.
410Gone
  • The dataset exists but has been frozen. The endpoint will never return new data.
429Too Many Requests
  • Rate limit exceeded. Check X-RateLimit-Reset and Retry-After headers.
500Internal Server Error
  • The dataset file could not be read from storage.
  • The dataset file could not be parsed.

Next steps