Data Extraction API
Turn HTML into JSON
The Data Extraction API transforms any Hyperclay site into a queryable data source. Append ?data= to any site URL with extraction rules to get structured JSON.
Example 1: Basic Text Extraction
HTML
<h1>My Tech Blog</h1>
<p class="tagline">Latest in technology</p>
<p class="copyright">© 2024 My Tech Blog</p>Query
?data={title:"h1",tagline:".tagline",footer:".copyright"}Formatted Query
{
title: "h1",
tagline: ".tagline",
footer: ".copyright"
}Result
{
"title": "My Tech Blog",
"tagline": "Latest in technology",
"footer": "© 2024 My Tech Blog"
}Example 2: Array Extraction
HTML
<article class="post">
<h2 class="post-title">Understanding JavaScript</h2>
<span class="author">Alice Smith</span>
<span class="date">2024-01-15</span>
</article>
<article class="post">
<h2 class="post-title">Python for Data Science</h2>
<span class="author">Bob Johnson</span>
<span class="date">2024-01-14</span>
</article>
<article class="post">
<h2 class="post-title">Rust Performance Tips</h2>
<span class="author">Carol White</span>
<span class="date">2024-01-13</span>
</article>Query
?data={titles:".post-title[]",authors:".author[]"}Formatted Query
{
titles: ".post-title[]",
authors: ".author[]"
}Result
{
"titles": [
"Understanding JavaScript",
"Python for Data Science",
"Rust Performance Tips"
],
"authors": ["Alice Smith", "Bob Johnson", "Carol White"]
}Example 3: Complex Iteration
HTML
<div class="products">
<div class="product" data-id="1">
<h3 class="name">Widget A</h3>
<span class="price">$19.99</span>
<a href="/products/widget-a">View Details</a>
</div>
<div class="product" data-id="2">
<h3 class="name">Widget B</h3>
<span class="price">$29.99</span>
<a href="/products/widget-b">View Details</a>
</div>
</div>Query
?data={products:[".product",{name:".name",price:".price",link:"a@href",id:"@data-id"}]}Formatted Query
{
products: [
".product",
{
name: ".name",
price: ".price",
link: "a@href",
id: "@data-id"
}
]
}Result
{
"products": [
{
"name": "Widget A",
"price": "$19.99",
"link": "/products/widget-a",
"id": "1"
},
{
"name": "Widget B",
"price": "$29.99",
"link": "/products/widget-b",
"id": "2"
}
]
}Fancy Example
https://panphora.hyperclay.com/data?data={
siteName: ".site-name",
description: ".site-description",
newsletterDescription: ".newsletter-description",
posts: ["[post]",{
date: ".post-date@date",
description: ".post-description",
note: ".post-note",
projects: [
"[project]",
{
type: ".project-type@project_type",
name: ".project-name",
url: ".project-name@href",
description: ".project-description"
}
]
}]
}Try it out: Go to panphora.hyperclay.com/data?data…
Syntax Reference
Basic Selectors
- Tag:
"h1","p","article" - Class:
".classname" - ID:
"#unique-id" - Attribute:
"[attr]","[attr='value']"(e.g.,"[post]","[data-id='1']") - Current element:
"."(useful in iterations)
Arrays
- Add
[]suffix to get all matching elements:".post-title[]"
Attributes & Properties
- Use
@to extract attributes:"a@href","img@src" - DOM properties:
"input@value","@checked","@disabled" - Extract from custom attributes:
".post-date@date",".project-type@project_type"
Iteration
[selector, shape]- Two-element array where:- First element: selector to iterate over (string)
- Second element: shape object to extract from each match
- Example:
["[post]", {title: ".title", date: ".date"}] - Selectors are scoped to each matched element
Nesting
- Objects can be nested:
{meta: {author: ".author", date: ".date"}} - Arrays can contain objects:
[".item", {name: ".name"}] - Arrays can be nested:
["[post]", {projects: ["[project]", {name: ".name"}]}]
Important: Quoting in URLs
All selectors must be quoted strings in the actual URL. The examples below show both formatted (for readability) and actual URL syntax.
Formatted (for documentation):
{
posts: [".post", {title: ".title", date: ".date"}]
}Actual URL (what you type):
?data={posts:[".post",{title:".title",date:".date"}]}Response Format
- Success: Returns extracted JSON data
- Missing elements: Return
null - Empty arrays: Return
[] - Errors: Return HTTP status codes with error messages
Caching
Responses are cached for 5 minutes. Check headers:
X-Cache: HIT- From cacheX-Cache: MISS- Fresh extraction
Use Cases
Simple CMS
Turn any HTML page into a content source. Extract blog posts, product catalogs, or navigation menus to power other applications.
Simple API
Provide structured data access to your Hyperclay sites without building a backend. Perfect for dashboards, integrations, and monitoring.
Limitations
- Static content only - No JavaScript execution
- Text and attributes only - Not raw HTML
- Read-only - Cannot modify sites
Last updated on