API Reference
Every parameter. Every header. Every format.
One endpoint. Stack as many query params as you need. Public datasets require no authentication.
Endpoint
GET https://quorel.vercel.app/api/{dataset_id}/{name-slug}/{version}/dataset_id
The numeric ID of the dataset. Found in the dashboard and in the URL of any dataset page.
name-slug
The dataset name, lowercased and hyphenated. Must match exactly — mismatches return 404 even if the ID is correct.
version
active (the pinned version), latest (the most recent version), or v1, v2, v3 etc. for a specific version number.
Alt version shorthand
GET https://quorel.vercel.app/api/{dataset_id}/{name-slug}/{version}/alt/
# equivalent to ?alt=trueAuthentication
Public datasets require no authentication. Private datasets require a Bearer token sent in the Authorization header. Find your private key in the dataset settings inside your dashboard.
GET https://quorel.vercel.app/api/42/my-dataset/active/ Authorization: Bearer your_private_key
Rate limits
Public (no auth)
100 requests / minute / IP
Private (Bearer token)
300 requests / minute / IP
Rate limits are per IP. Private auth uses a separate bucket from public requests on the same IP.
Query parameters
All parameters are optional. They are applied in a fixed pipeline order — see the processing pipeline section below for the exact sequence.
Output format
Response format. One of: json, jsonl, csv, tsv, xml, parquet. csv, tsv, and parquet automatically flatten nested objects.
Pretty-print JSON output with indentation. Set to false for compact single-line output. Has no effect on other formats.
Pagination
Maximum number of entities to return. Applied after all filters, sorting, and deduplication.
Skip the first N entities. Applied after filters and sorting, before limit. Use with limit for cursor-style pagination.
Return N randomly sampled entities. Uses cryptographically secure shuffling. Applied after filters, before offset and limit.
Return only the count of matched entities, not the entities themselves. Response is a JSON object with count, dataset_id, version, and alt fields.
Filtering
Exact match on a field. Format: filter=field:value. Case-insensitive. Supports dot-notation for nested fields (filter=author.name:ada). Repeatable — multiple filters are ANDed together. Fans out across arrays.
Partial match on a field. Format: filter_contains=field:value. Case-insensitive substring match. Repeatable and ANDed like filter. Fans out across arrays.
Full-text search across every field at every depth. Comma-separated values are ORed by default. Use keywords_mode=and to require all terms.
Controls how multiple keywords are combined. or returns entities matching any keyword. and requires all keywords to be present.
Sorting
Sort by a top-level field. Format: sort=field:asc or sort=field:desc. Numeric fields sort numerically. String fields sort lexicographically. Entities where the field is missing or nested are sorted last.
Field selection
Return only the specified fields. Comma-separated. Supports dot-notation: keep_field=title,author.name keeps the full author object but strips every field inside it except name.
Remove the specified fields from every entity. Comma-separated. Supports dot-notation: drop_field=reviews.content removes content from every object in the reviews array.
Cleaning
Remove duplicate entities. By default, deduplication is based on a SHA-256 hash of all fields except _source. Use dedup_key to dedup on specific fields only.
Comma-separated list of fields to use as the deduplication key. Only meaningful when dedup=true. Example: dedup_key=title,author.
Recursively strip null values and empty strings at every depth. Empty arrays that result from stripping are also removed.
Include the _source field (the origin URL of each entity). Set to false to strip it from the response.
Structure
Flatten nested objects into dot-notation keys. Arrays of primitives are JSON-encoded into a single string cell. Arrays of objects are expanded with indexed keys: reviews_0_rating, reviews_1_rating etc. Required automatically for csv, tsv, and parquet.
Return the alt version of this dataset version if one exists. Equivalent to appending /alt/ to the URL path.
Response headers
Every successful response includes these headers, regardless of format.
X-RateLimit-Limit
The total number of requests allowed per minute for your authentication tier (100 for public, 300 for private).
X-RateLimit-Remaining
The number of requests remaining in the current rate limit window.
X-RateLimit-Reset
Unix timestamp of when the current rate limit window resets.
X-Dataset-ID
The numeric ID of the dataset that was served.
X-Dataset-Version
The resolved version number, prefixed with v. e.g. v12. Useful when you requested active or latest and want to know the exact version.
X-Dataset-Alt
true if the alt version was served, false otherwise.
X-Total-Count
The number of entities in the response after all filters, deduplication, sampling, offset, and limit have been applied.
Retry-After
Only present on 429 responses. Number of seconds to wait before retrying.
Response formats
Default. Returns a JSON array of entity objects. Supports pretty=false for compact output.
Nested objects are preserved as-is unless flatten=true is set.
Newline-delimited JSON. One entity per line. Ideal for streaming, log pipelines, and large datasets.
No pretty-printing. Each line is a self-contained JSON object.
Comma-separated values with a header row. Response includes Content-Disposition with a filename.
Automatically flattens nested objects. Arrays of objects expand with indexed keys.
Tab-separated values. Same structure as CSV but tab-delimited. Response includes Content-Disposition.
Automatically flattens nested objects. Arrays of objects expand with indexed keys.
XML document with a <dataset> root element and one <entity> per record. Field names become element tags.
Non-alphanumeric characters in field names are replaced with underscores. Arrays are wrapped in a pluralized parent tag.
Apache Parquet binary format. Response includes Content-Disposition. All values are coerced to strings.
Automatically flattens nested objects. Best for large datasets consumed by data warehouses or Pandas.
Processing pipeline
Query parameters are always applied in this fixed order, regardless of the order you write them in the URL. Understanding the sequence matters when combining filters, dedup, sample, and pagination.
Examples
Top 10 HN stories sorted by score
GET https://quorel.vercel.app/api/1/hacker-news-top/active/
?limit=10
&sort=score:descRemote jobs mentioning "React" or "TypeScript", title and salary only
GET https://quorel.vercel.app/api/2/remote-jobs/active/
?keywords=React,TypeScript
&keep_field=title,salary
&sort=salary:desc
&denull=trueCount of YC companies in the finance category
GET https://quorel.vercel.app/api/4/yc-companies/active/
?filter=category:finance
&count=true5 random AI papers, deduplicated by title, as CSV
GET https://quorel.vercel.app/api/7/arxiv-ai-papers/latest/
?dedup=true
&dedup_key=title
&sample=5
&format=csv
&drop_field=_sourcePrivate dataset with Bearer token, compact JSON
GET https://quorel.vercel.app/api/99/my-private-dataset/active/?pretty=false Authorization: Bearer your_private_key
Error codes
All error responses are plain text with an appropriate HTTP status code. Error bodies are human-readable descriptions of what went wrong.
- Invalid path structure.
- dataset_id is not a positive integer.
- version is not active, latest, or vN.
- format is not one of the supported values.
- Private dataset accessed without an Authorization header. Response includes WWW-Authenticate: Bearer.
- Authorization header present but the Bearer token does not match the dataset's private key.
- No dataset with the given ID exists.
- The name slug does not match the dataset's name.
- The requested version does not exist.
- Dataset has no active version and active was requested.
- alt=true was requested but no alt version exists for this version.
- Any method other than GET.
- The dataset exists but has been frozen. The endpoint will never return new data.
- Rate limit exceeded. Check X-RateLimit-Reset and Retry-After headers.
- The dataset file could not be read from storage.
- The dataset file could not be parsed.