Data Extraction API

Turn HTML into JSON

The Data Extraction API transforms any Hyperclay site into a queryable data source. Append ?data= to any site URL with extraction rules to get structured JSON.

Example 1: Basic Text Extraction

HTML


<h1>My Tech Blog</h1>
<p class="tagline">Latest in technology</p>
<p class="copyright">© 2024 My Tech Blog</p>

Query


?data={title:"h1",tagline:".tagline",footer:".copyright"}

Formatted Query


{
  title: "h1",
  tagline: ".tagline",
  footer: ".copyright"
}

Result


{
  "title": "My Tech Blog",
  "tagline": "Latest in technology",
  "footer": "© 2024 My Tech Blog"
}

Example 2: Array Extraction

HTML


<article class="post">
  <h2 class="post-title">Understanding JavaScript</h2>
  <span class="author">Alice Smith</span>
  <span class="date">2024-01-15</span>
</article>
<article class="post">
  <h2 class="post-title">Python for Data Science</h2>
  <span class="author">Bob Johnson</span>
  <span class="date">2024-01-14</span>
</article>
<article class="post">
  <h2 class="post-title">Rust Performance Tips</h2>
  <span class="author">Carol White</span>
  <span class="date">2024-01-13</span>
</article>

Query


?data={titles:".post-title[]",authors:".author[]"}

Formatted Query


{
  titles: ".post-title[]",
  authors: ".author[]"
}

Result


{
  "titles": [
    "Understanding JavaScript",
    "Python for Data Science",
    "Rust Performance Tips"
  ],
  "authors": ["Alice Smith", "Bob Johnson", "Carol White"]
}

Example 3: Complex Iteration

HTML


<div class="products">
  <div class="product" data-id="1">
    <h3 class="name">Widget A</h3>
    <span class="price">$19.99</span>
    <a href="/products/widget-a">View Details</a>
  </div>
  <div class="product" data-id="2">
    <h3 class="name">Widget B</h3>
    <span class="price">$29.99</span>
    <a href="/products/widget-b">View Details</a>
  </div>
</div>

Query


?data={products:[".product",{name:".name",price:".price",link:"a@href",id:"@data-id"}]}

Formatted Query


{
  products: [
    ".product",
    {
      name: ".name",
      price: ".price",
      link: "a@href",
      id: "@data-id"
    }
  ]
}

Result


{
  "products": [
    {
      "name": "Widget A",
      "price": "$19.99",
      "link": "/products/widget-a",
      "id": "1"
    },
    {
      "name": "Widget B",
      "price": "$29.99",
      "link": "/products/widget-b",
      "id": "2"
    }
  ]
}

Fancy Example


https://panphora.hyperclay.com/data?data={
  siteName: ".site-name",
  description: ".site-description",
  newsletterDescription: ".newsletter-description",
  posts: ["[post]",{
    date: ".post-date@date",
    description: ".post-description",
    note: ".post-note",
    projects: [
      "[project]",
      {
        type: ".project-type@project_type",
        name: ".project-name",
        url: ".project-name@href",
        description: ".project-description"
      }
    ]
  }]
}

Try it out: Go to panphora.hyperclay.com/data?data…

Syntax Reference

Basic Selectors

Tag: "h1", "p", "article"
Class: ".classname"
ID: "#unique-id"
Attribute: "[attr]", "[attr='value']" (e.g., "[post]", "[data-id='1']")
Current element: "." (useful in iterations)

Arrays

Add [] suffix to get all matching elements: ".post-title[]"

Attributes & Properties

Use @ to extract attributes: "a@href", "img@src"
DOM properties: "input@value", "@checked", "@disabled"
Extract from custom attributes: ".post-date@date", ".project-type@project_type"

Iteration

[selector, shape] - Two-element array where:
- First element: selector to iterate over (string)
- Second element: shape object to extract from each match
Example: ["[post]", {title: ".title", date: ".date"}]
Selectors are scoped to each matched element

Nesting

Objects can be nested: {meta: {author: ".author", date: ".date"}}
Arrays can contain objects: [".item", {name: ".name"}]
Arrays can be nested: ["[post]", {projects: ["[project]", {name: ".name"}]}]

Important: Quoting in URLs

All selectors must be quoted strings in the actual URL. The examples below show both formatted (for readability) and actual URL syntax.

Formatted (for documentation):


{
  posts: [".post", {title: ".title", date: ".date"}]
}

Actual URL (what you type):


?data={posts:[".post",{title:".title",date:".date"}]}

Response Format

Success: Returns extracted JSON data
Missing elements: Return null
Empty arrays: Return []
Errors: Return HTTP status codes with error messages

Caching

Responses are cached for 5 minutes. Check headers:

X-Cache: HIT - From cache
X-Cache: MISS - Fresh extraction

Use Cases

Simple CMS

Turn any HTML page into a content source. Extract blog posts, product catalogs, or navigation menus to power other applications.

Simple API

Provide structured data access to your Hyperclay sites without building a backend. Perfect for dashboards, integrations, and monitoring.

Limitations

Static content only - No JavaScript execution
Text and attributes only - Not raw HTML
Read-only - Cannot modify sites