Using phantom data types in TypeScript (and Rust) APIs
29/03/2024 Thierry KühniRecently, I removed Google Analytics from a Website and used Umami analytics instead. Umami is a neat, open-source, privacy-focused web analytics software you can easily self-host. Best of all, it is GDPR compliant and does not use cookies (Begone pesky cookie consent banners :D). Umami can track a multitude of custom events happening on the website. For example, I used a custom event to track clicks on social-media share buttons to gain a better insight into which content is shared more frequently than others.
The Umami function to report such a custom event from the browser looks like this:
track(event_name: string, event_data?: { [key: string]: string | number }): void;
With this function, we can assign an event name and append custom data to this event. One could simply use the function as-is inside the existing codebase. Still, given that I was confronted with a somewhat larger codebase where such events need to be sent from multiple locations in multiple files, I opted for a better approach.
Looking at Umami’s track
function, multiple things might go wrong in terms of
usage:
- Inconsistent event names being used for the same event (for example, “share” vs. “share click”)
- Different
event_data
payloads being sent for the same event
…all of which would lead to inconsistent analytics data that would be harder to manage and analyze.
An easy solution to the event name problem would be simply providing all possible events as string constants and using those constants. However, this does not prevent the event data from differing between different calls of the track function for the same event, let alone prevent the developer from simply using a custom string as an event name instead of the constants. Clearly, there should be some kind of “invisible” connection between the event name and the corresponding event data.
Coming from Rust, Rust’s PhantomData
type marker immediately sprang to mind. The wrapper function for Umami’s track
would have looked like this:
use std::marker::PhantomData;
pub struct Event<'a, T> {
pub name: &'a str,
pub phantom: PhantomData<T>,
}
pub fn analytics_event<T>(event: &Event<T>, event_data: T) {
track(event.name, event_data);
}
fn track<T>(_event_name: &str, _event_data: T) {}
As can be seen, the Event
struct would contain the event name and the generic T
, which represents the type of event_data
belonging to this specific event.
Thus, the Event
struct is a phantom data type as T is not directly stored
within it (The PhantomData
marker type is merely used so that the compiler
does not complain about the unused T
generic and will get removed at
compile-time). As the analytics_event
wrapper function definition now enforces
that both the provided Event
has the generic T
as well as the supplied event_data
is of type T
, users of this function can now only provide the
matching event_data
type for a specific Event
struct:
use std::marker::PhantomData;
struct SharePayload {}
struct OtherPayload {}
// Create the share event and associate the SharePayload type to it
const SHARE_EVENT: Event<'static, SharePayload> = Event {
name: "share",
phantom: PhantomData,
};
fn main() {
// This is OK
analytics_event(&SHARE_EVENT, SharePayload {});
// Big yikes! Mismatched types. Expected `SharePayload`, found `OtherPayload`
analytics_event(&SHARE_EVENT, OtherPayload {});
}
A first naive way to translate this code into TypeScript looked like this:
type Event<T> = string;
export function analyticsEvent<T>(event: Event<T>, payload: T) {
track(event, payload);
}
// Possible analytic events
export const ANALYTICS_SHARE_EVENT: Event<{
method: "link" | "twitter" | "facebook";
url: string;
}> = "share";
This code looks reasonable. Again, we have the phantom type Event
with the
generic T
representing the associated payload type. The analyticsEvent
function again enforces with the generic T
that the provided event and payload
types match. There are two problems with this, though. TypeScript rightly
complains that the generic T
in the Event
type is unused. Also, the ANALYTICS_SHARE_EVENT
constant ends up being coerced to a simple string
primitive, rendering all type checks useless.
After some googling, I found the right approach to using phantom data in
TypeScript. In TypeScript, the approach outlined below is also more popularly
called “type branding”, in case you want to further read into the topic. The
modifications required to the Event
type to enforce the phantom data without
TypeScript complaining look like this:
declare const __phantom: unique symbol;
type Event<T> = string & { [__phantom]: T };
To create a rough equivalent to Rust’s PhantomData
type marker in TypeScript,
we use an intersection type where we add an object type to our base type (in our
case, the string storing the event name). This object contains a single computed
key __phantom
of type T
(The generic is used now). __phantom
is a unique
symbol type, ensuring that each variant of the Event
type has a unique __phantom
key. In our case, this is not necessarily important but is essential
if you want to create a general representation of a branded type in TypeScript.
The second advantage of using a computed property is that developers using
values with the Event
type cannot see the __phantom
key in their
auto-completion suggestions.
It is crucial to remember that the object containing the __phantom
key will
never actually exist at runtime. It is simply a way to make the TypeScript
compiler aware of the phantom type, similar to Rust’s dedicated PhantomData
type.
The approach to reaching the above code is described in more detail in this excellent blog post.
Concluding the resulting wrapper code around Umami’s track
function looks like
this:
declare const __phantom: unique symbol;
type Event<T> = string & { [__phantom]: T };
export function analyticsEvent<const T>(event: Event<T>, payload: T) {
umami.track(event, payload);
}
// Possible analytic events
export const ANALYTICS_SHARE_EVENT = 'share' as Event<{
method: 'link' | 'twitter' | 'facebook';
url: string;
}>;
export const ANALYTICS_XYZ_EVENT = ...
Note
As correctly pointed out by a fellow reader declaring the analyticsEvent
function’s generic like this <T>
allows users
to use any arbitrary string for the method
property when using the ANALYTICS_SHARE_EVENT
, as method
is being coerced to type string
.
In TypeScript 5, you can now declare the generic as <const T>
, which prevents
such type coercions, in our case, preventing the user from inputting arbitrary
string values instead of the defined values for method
in the payload.
If this code is located within its own TypeScript file (or provided as a
library), we now have a neat API that is hard to misuse. New Events can now only
be created within this file, ensuring that developers do not create the same
events twice. Additionally, the analyticsEvent
function now properly ensures
that only matching payload data can be provided for the respective event.
Not only can we now ensure consistent analytics data reporting, but this code
will also be a lot easier to maintain in the long run (perhaps even by different
developers) as the phantom type and API clearly define what can and cannot be
done, thus catching many potential usage errors before they even hit the
runtime, even if the analyticsEvent
function is used many times scattered
throughout the code base.
In Rust, I’ve grown quite fond of phantom data types, especially in libraries, to prevent incorrect usage at compile time with zero runtime cost. For things like database transactions (linking keys to the stored data type) or more complex relationships between types in a library, it can be an essential tool for enforcing correctness and creating a pleasant developer experience.