Journalistic Tools Fast: Declarative Programming with Htmx and Hyperscript

I explore two web-based declarative languages & tools that allow me to build one-off utilities that greatly speed up my investigative process

Tue 12 July 2022 ~ Brandon Roberts
///_hyperscript

Htmx and Hyperscript are two different, but related web languages/tools. Together, they allow programmers to write browser-based interfaces with much less code than if one had chosen a single-page application framework like React.

When I work on data heavy investigations, I often need to build one-off web apps that automate manual tasks. This is especially true when machine learning or NLP are part of my workflow. Just last month I needed to seamlessly review PDFs and split the pages up into groups of sub-documents. It wasn't really feasible to do it by hand—I had thousands of files, many of them quite large. Time spent waiting for things to load or going between the mouse and keyboard gets frustrating quick. I needed a tool.

Conventional wisdom contends that I should build a single page app (SPA) from the ground up, using something like React or whatever. Timesucks when writing SPAs include: handling file uploads, writing page navigation and menus, user auth, configuration, little UI transitions, managing intermediate state, data serialization—I could increase this list ad-infinitum. Writing SPAs is tedious. I've grown tired of it. I end up spending far too much time working out features not directly related to the predicament at hand. In the case of my previously mentioned PDF tool, nothing was unique or new, all I needed was the ability to:

  • Display a list of documents + some metadata (name, page count, status)
  • Display all the page images of a selected document
  • Visualize page sub-document groups somehow
  • Create, remove or change the groups

This list isn't long, but some parts are deceptively tricky. The thought of just powering through with yet another SPA was too much to accept, so I decided to try something different.

Recently, I came across two technologies: Hyperscript and Htmx. They let programmers write UIs in a declarative style that embraces some unusual philosophies. I find them to be well suited for quickly building the one-off tools that I use in my investigations.

Declarative Programming: Say what to do, not how to do it 🔗

Programming is hard. I say this to students all the time. But, with patience and persistence, one eventually learns how build something useful out of the series of loops, data transformations, and conditionals that is writing code. But why is programming like this? The answer is simple: nearly all popular code languages are in the imperative family. This is also known as a paradigm. In the imperative paradigm of programming, the coder's task is to tell the computer exactly how to accomplish a function, often at a very low level. The programmer builds data structures, specifies control flow and applies particular algorithms.

On the far other end of the coding language spectrum lies declarative programming. In this paradigm, the programmer's task is to tell the computer what to do, not how to do it.

Programming in a declarative language can be confusing and seem limiting at first—it is unfamiliar to most coders. Instead of writing steps of instructions, the programmer describes a set of conditions to be met or a desired solution. When the program "runs", the language uses this description to find the best way to accomplish the task.

Here are some examples of popular declarative languages and what the task looks like for the programmer:

SQL

SQL is the most pervasive declarative language. Programmers write Queries that consist of descriptions of columns, filter/group conditions and sort methods. These are passed to the query planner, which figures out the quickest way to satisfy the request, and returns the rows it finds.

Example

Let's say a programmer wants to find all donations above $1,000 to any candidate in a campaign finance database with two tables, candidate_names and donations. The query would look like this:

SELECT
    candidate_names.name as Candidate,
    donations.company as Company,
    donations.amount as Amount
FROM candidate_names
JOIN donations
    ON candidate_names.name = donations.candidate_name
WHERE donations.amount > 1000

The query planner will read the code, noting things like:

  • the data types of the columns (numbers, text, etc)
  • if the tables have indexes
  • statistics about size of the tables/etc.

From this information, the query planner will break the request into steps. It will apply the best algorithm it knows to each one, eventually obtaining a final result that looks like:

Candidate,      Company,               Amount
Lisa Smith,     ACME Cable Trenching,  2,409
Janette Ryans,  Grocery Store #12,     12,409

The way a query is turned into a sequence of steps is really complex. The PostgreSQL database has some graduate-level documentation about how their query planner works. It's really interesting reading, but if you don't want to read that, just know there's both an art and a science to it. Importantly: it saves the programmer who is querying data a ton of time. Without it, they'd need to write entire programs. (That's how things were done in the 50s-60s before relational databases existed.)

An example of a PostgreSQL query plan illustration

This is the visual representation of a query planner's strategy for executing a request, created by PostgreSQL. The query in question calculates the number of documents delivered per month, by agencies whose requests were completed. Here, the query planner is going to create a hash table, use it to join the documents_agency and documents_document tables, then perform a sort and a final count.

Prolog

Prolog is a lesser known declarative language. It's based around the logic programming concept at AI's start in the early 70s. Prolog programs, known as databases, consist of logical facts and relationships between them. The programmer loads the database and then poses questions to it. These questions are passed to something called the solver which returns possible answers that logically follow the relationships between the facts and the inquiry.

Example

Here's a Prolog program example describing the members of a family and rules defining common family relationships:

mother_child(trude, sally).

father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).

sibling(X, Y) :-
    parent_child(Z, X),
    parent_child(Z, Y).

parent_child(X, Y) :-
    father_child(X, Y).
parent_child(X, Y) :-
    mother_child(X, Y).

The programmer could then pose the question: is Sally the sibling of Erica? To which, the solver would answer Yes:

?- sibling(sally, erica).
 Yes

Prolog, and declarative logic programming in general, can be used for scheduling, theorem proving, assumption checking and complicated gift exchanges. Before we had fast computers and things like machine learning, people also used Prolog for natural language processing. In the near future, I'll post a followup specifically about using Prolog for investigations.

Htmx: Declarative Programming for the Web 🔗

Htmx takes the concept of a declarative language engine (the query planner in SQL and the solver in Prolog) and brings it to web applications programming. Htmx provides a small JavaScript (JS) engine that operates using special HTML attributes. All the common things that web applications do, like fetch URLs, render responses, and event handling can be done in a declarative style. This drastically reduces the upfront labor in building UIs.

An illustrative example is a button that displays a list of results when clicked. To handle the click event in the SPA style, one would need to write a bunch of JS, asynchronously fetch an API page, parse the data, and render the data as HTML. This could easily be a hundred lines of code. With htmx, this is possible with just a couple tags:

<span class="button"
   hx-get="/pages/10"
   hx-target="#doc-10-pages-area"
   hx-swap="innerHTML">
Load Pages
</span>
<div id="doc-10-pages-area"></div>

Above, when the "Load Pages" span is clicked, htmx fetches the /pages/10 URL and writes the results into the div. This greatly simplifies building UIs. One catch here is that htmx expects the backend to return htmx or HTML, not JSON. This is in stark contrast to the JSON REST API backend assumed by most SPA frameworks.

At first, using htmx was definitely weird, but once it "clicked" I realized I could throw all kinds of web applications together with barely any effort. At times it was so easy it felt like cheating.

Building a simple htmx backend with Django 🔗

When I'm managing large scale inquiries, I like to keep track of all the agencies I've requested from and the records I've received from them. This way I already have all the metadata for my PDFs in a Django application. I continue to use this system as I convert, clean and process the data.

The home page of the tool simply lists all the agencies from which I have PDFs in need of review. Home page view code:

def GET_segmentable_home(request):
    """
Renders a htmx-powered home page for segmenting
PDFs from agencies with responsive records. This
page simply lists the agencies, and allows the user
to click each agency, rendering the agency page
(which lists each document).
    """
    context = {
        "agencies": Agency.objects.all(),
    }
    return render(
        request,
        "segmentable_home.htmx",
        context=context
    )

The template, segmentable_home.htmx, loops over each agency and renders them using the agency.htmx template. This is going to important later.


<body>
  <div class="main">
  <section class="segmentable-home">
    {% for agency in agencies %}
      {% include 'agency.htmx' with agency=agency %}
    {% endfor %}
  </section>
  <!-- half the screen: full PDF page -->
  <section id="zoomed-page">
  </section>
</body>

Here's the agency.htmx template itself:

<div id="agency-{{ agency.id }}"
         class="agency">
  <h2>{{ agency.name }}</h2>
  <p class="info complete">
    Total Segmented: {{ agency.segmented_docs }}
  </p>
  <p class="info total">
    Total Docs: {{ agency.total_segmentable_docs }}
  </p>
  <p class="button"
     hx-get="{% url 'docs' agency.id %}"
     hx-target="#agency-{{ agency.id }}"
     _="on click add .opened to #agency-{{ agency.id }}">
     Show Documents
  </p>
</div>

This template reveals two key htmx concepts. First, each item (agency) in the list is its own template making the open/close logic really simple. Opening an agency is accomplished with the Show documents button. When clicked, htmx fetches the 'docs' endpoint and replaces everything inside the div tag with the result. This should reveal that agency's list of PDF documents (docs.htmx). Closing the list reverses this process by replacing the div tag with the original agency.htmx. Here's the close button:

<p class="button"
   hx-get="{% url 'agency.htmx' agency.pk %}"
   hx-target="#agency-{{ agency.pk }}"
   hx-swap="outerHTML">
   Close Documents List
</p>

Clicking the close button gets us back to the initial state.

The second principle at play is the use of hyperscript to trigger UI transitions. When the user clicks an agency documents list to view, the opened class gets added to the agency. I use this to make the agency item more prominent on the screen. When the user closes the documents list, the class will be removed because the original agency div element no longer has the opened class added.

There's a circular nature to writing htmx tools that can be unintuitive at first. But, once it is familiar, writing code in this way is really quick and highly re-usable. I was able to implement this strategy to quickly build the agencies, documents list and document pages list views.

Simplifying complex UX with Hyperscript 🔗

The second part of this app that could be complicated is the page grouping stuff. I need a grid of PDF pages that shows which ones are in a group. Ideally, clicking a page would add or remove it from the adjoining group without sub-menus or follow-up clicks required.

A screenshot of the page segmenter UI built with htmx and hyperscript

This is the page-segmentation UI built using the htmx+hypersciprt techniques described here. Clicking on one of the pages causes it to join or separate from the previous page. The borders display the page groups. You can see an enlarged version of each page by hovering over it with your mouse.

First, I came up with a simple way to encode the page groups:

[[group0_1st_page, group0_last_page],
  ...,
  [groupN_1st_page, groupN_last_page]]

The above is zero-indexed and inclusive. It assumes that pages in a group are always next to each other (which is true in my data). A PDF with 5 pages, where the first and second pages are in a group and the rest in a second group would look like this:

[[0,1],[2,4]]

I chose this encoding for a few reasons: First, the rendering code requires a single pass, next, I could represent groups visually using pure CSS and one class, end, and lastly, I could modify groups by adding or removing that same class. Here's what the rendering template code looks like:

<div id="doc-{{ doc.id }}-pages">
  {% for img in images %}
  <div id="pg-img-{{ doc.id }}-{{ loop.index }}"
       class="{% if loop.index in pg_group_ends[{{doc.id}}] %}
                end
              {% endif %} pg-img"
        _="on click
              toggle .end then call saveSegments({{ doc.id }})">
    <img src="{{ img.url }}"
         alt="Page {{ img.page }}" />
    <caption>Page {{ img.page }}</caption>
  </div>
  {% endfor %}
</div>

The above snippet of code takes care of all the page grouping functionality. It works like this: We iterate over the document images in a loop and check if the current page index is one of the end pages of a group. (This gets calculated in the view in advance.) If so, then we add the end class to the image. When the user clicks on a page, we toggle the end class, which either makes the page the end of a group or merges the page with the following group. After toggling the class, we call a plain old JavaScript function, saveSegments, that re-builds the page groups list for a given document by iterating over the images and creating a new page group every time it finds one with the end class. We send this list to the backend for saving.

Visualizing the page groups is done using the end class and some fancy CSS selectors.

/* .end default: full border */
.pg-img.end {
  border: 10px solid black;
}
/* non-end: top and bottom borders */
.pg-img:not(.end) {
  border-top: 10px solid black;
  border-bottom: 10px solid black;
  margin-right: 0;
  margin-left: 0;
}
/* end of multi-pg */
.pg-img:not(.end) + .pg-img.end {
  border-top: 10px solid black;
  border-bottom: 10px solid black;
  border-right: 10px solid black;
  border-left: 0 transparent;
  margin-left: 0;
}
/* start of multi-pg group */
.pg-img.end + .pg-img:not(.end) {
  border-top: 10px solid black;
  border-bottom: 10px solid black;
  border-right: 0 transparent;
  border-left: 10px solid black;
  margin-left: 10px;
}
/* single-pg groups next to each other */
.pg-img.end + .pg-img.end {
  border: 10px solid black;
}
/* very first image part of multi-pg group */
.pg-img:not(.end):first-of-type {
  border-left: 10px solid black;
}

With that, I have all the functionality I need for my page grouping tool. And I managed to avoid writing much code at all.

Conclusion + Future Work 🔗

I am really impressed by the abilities of htmx and hyperscript. With careful crafting of data structures and thinking through my workflow, it was possible to do a lot with very little. In the time since I've written this PDF page grouping tool, I've used htmx+hyperscript on several other projects where I needed small interactive tools. In particular, I've been using it to flesh out plugins inside the Datasette ecosystem with really good results.

I'm a big proponent of using the right tool for the job. Every tool has its place and I believe htmx+hyperscript are extremely well suited to small projects—the kind typically needed during investigative projects. That said, I'd caution against using them on large, complex projects. The tradeoffs that allow you to build small things fast come at a cost.

There are still some rough edges around using htmx—the main one being that there's no backend that directly supports the kind of thinking that it expects. Using Django works because of how easily its template system is mixed with htmx code. But, I quickly realized I had stumbled upon a boilerplate pattern of grabbing objects, rendering the htmx/jinja template, and adding the open/close logic described above. Even if it's not complicated or verbose, it gets repetitive. In the future, I'd like to try building a htmx-specific ModelView for Django or maybe even a custom htmx serializer for the Django Rest Framework.