Skip to content

Our Cloudflare Workers and Workers KV Wishlist

All of CDN Planet lives entirely in Cloudflare's edge platform. The website uses Workers Sites and our CDN Finder tool, and its underlying API, is built with Workers and Workers KV.

Everything runs smoothly, costs are low and the developer experience is great.

However, we do have our frustrations and a (growing) list of things that are missing/not possible that will prevent us from building and running all new upcoming services entirely in Cloudflare.

1. Durable Objects

Durable Objects provides strongly consistent key-value storage at the edge. Multiple Workers instances can write to the same Durable Object simultaneously without any loss of data. It's the new hot thing in Workers.

We plan to use Durable Objects in several upcoming services. For some services it's a must-have to build the service entirely in Cloudflare.

The first use case is as simple as it is important: counters. The Durable Objects announcement blog post shows the example of an atomic counter and that is exactly what we need.

This counter is consistent even when receiving simultaneous requests from multiple clients -- none of the increments or decrements will be lost.

We have signed up for the Durable Objects limited beta . Unfortunately, we have not been granted access yet. and were granted access a few months later. We're excited to get our hands dirty with Durable Objects.

2. Long running functions

Theoretically, a Workers script can take a long time to execute. The Workers documentation states:

There is no limit on the real runtime for a Workers script. As long as the client that sent the request remains connected, the Workers script can continue processing, making subrequests, and setting timeouts on behalf of that request. When the client disconnects, all tasks associated with that client request are canceled. You can use event.waitUntil() to delay cancellation for another 30 seconds or until the promise passed to waitUntil() completes.

We want to trigger a Worker to run and perform tasks for tens of minutes, or several hours even. Is this already achievable today with a scheduled event? Maybe it is, because scheduled events don't have a client that can disconnect and end the worker process. But this still would not satisfy our need for long running worker processes triggered by an HTTP request.

Cloudflare announced Workers Unbound in July 2020.

We are extending our CPU limits to allow customers to bring all of their workloads onto Workers, no matter how intensive.

Unbound seems to be what we need. We're eagerly waiting on the response to our Unbound beta signup.. Now that we have beta access, Unbound is actually not a good fit with our use case: Unbound has a maximum duration of 30 seconds and this includes the worker waiting for network requests.

In normal Workers (not Unbound), theoretically, one way to achieve long running functions is to use multiple, chained invocations of the event.waitUntil() method, each extending the life time of the event up to 30 seconds. We have not tested this because it seems like an ugly solution.

Following more testing, we now know, using Workers with the Bundled usage model (no limit on duration), a script that consumes little CPU time can run just fine for several minutes, even without event.waitUntil().

3. Send requests to a specific POP

Using Anycast technology, Cloudflare routes traffic to the closest datacenter. This is great in general, but we'd like to have the ability to send HTTP requests to a specific POP.

Several upcoming tools and data services on CDN Planet require triggering a Workers script to run in specific countries. Given our assumption Cloudflare will not provide this capability, we're already investigating alternative ways to satisfy our needs.

4. Run scheduled event from a specific POP

This item on our wishlist is similar to the previous one, but relates to scheduled events instead of fetch events.

Cloudflare runs scheduled events on "underutilized machines to make the best use of our capacity and route traffic efficiently" [source: Cron Triggers documentation].

So, the scheduled event can run from any of the 200+ locations where Cloudflare has a POP? That will not work for us. We need control over the location.

We propose letting the customer select the country where the scheduled event must run. If Cloudflare does not list all countries where they have a POP, that's fine. A few countries per continent should be sufficient to give customers at least one suitable option.

5. Raise Workers limits for subrequests

A worker can make up to 50 subrequests per event.

This limit is in place to prevent a worker from DDoS-ing a remote service. Makes sense, but for our use case (fetch data from remote services, e.g. DNS, for thousands of domains) it's a limit that directly increases the complexity in our overall architecture because of how we need to work around the limit.

We'd love to see this limit be replaced by 'Not more than X subrequests per Y seconds'. This allows a long running worker to do many subrequests without the risk of the worker causing problems for a remote service.

6. Intra zone subrequests

Update Cloudflare Workers now has Service bindings :

Service bindings are an API that facilitate Worker-to-Worker communication via explicit bindings defined in your configuration. A Service binding allows you to send HTTP requests to another Worker without those requests going over the Internet.

Currently, a worker cannot send a subrequest to another worker in the same zone (a zone is a domain, e.g. cdnplanet.com). This limit is in place to prevent infinite loops/redirects that may exhaust resources at Cloudflare.

Although working around this limitation is not a big deal - add a second zone and split functionality in Workers across zones - it does, like the other workarounds, add complexity.

Our suggestion: allow intra zone subrequests but with a limit on how many intra zone subrequests a worker can make (see previous item).

7. Send subrequest to a specific IP

Imagine you're getting complaints from users in Europe. Apparently, your site is slow and you want to find out if the CDN is causing the slowness. You'll want to quickly gain insight into the current performance and behaviour of the CDN POPs in Europe.

What do you do? Fetch and parse CDN server logs? Kick off instant tests with your synthetic monitoring service? Quickly collecting all the right data can be quite a challenge.

We're building a service to help troubleshoot CDN content delivery issues. One of the required capabilities in this service is sending an HTTP request to a specific CDN POP.

For CDNs using Anycast routing technology (e.g. Cloudflare and StackPath) this can only be done if the CDN provides either the unicast IPs of its POPs or an FQDN to send the request to. For non-Anycast CDNs - CloudFront, CDN77 and others - we need to get the IP of the POP and be able to send an HTTP request to that IP.

Getting the IPs of POPs of non-Anycast CDNs is whole topic in itself and not covered in this article. Let's get into the part of actually sending an HTTP request to a specific IP address from inside a worker.

The example here is the Amazon.com website. Amazon uses multiple CDNs - Akamai, Fastly and CloudFront - and we'll work with CloudFront.

The IP address 99.86.238.192 currently maps to the Amazon CloudFront POP in Vienna, Austria. The code in a Workers script to send a request to that POP could look like this:

const response = await fetch("https://99.86.238.192/", {
headers: {
Host: "www.amazon.com"
}
})

This will throw an error in the Cloudflare worker. Fetch does not allow setting the Host header. It's one of the forbidden header names that cannot be modified programmatically.

Workers has the request property resolveOverride and from it's name one might think this is exactly what we need. Unfortunately, it is not, because the property does not take an IP address as its value.

The value of resolveOverride specifies an alternate hostname which will be used when determining the origin IP address, instead of using the hostname specified in the URL ... resolveOverride will only take effect if both the URL host and the host specified by resolveOverride are within your zone ... If you need to direct a request to a host outside your zone (while keeping the Host header pointing within your zone), first create a CNAME record within your zone pointing to the outside host, and then set resolveOverride to point at the CNAME record.

Perhaps the solution lies in using Cloudflare's Resolve Override Page Rule. This rule configures the Cloudflare CDN to override the DNS lookup by specifying the hostname Cloudflare must use to resolve to an IP address. The rule only takes a hostname that exists in Cloudflare DNS in your zone. We'd create an A record pointing www.amazon.com.cdnplanet.com to 99.86.238.192.

The big question here is: do Page Rules apply to subrequests from inside a worker? Cloudflare's community site has a conversation about this, where Kenton Varda (Cloudflare Workers tech lead) writes:

security features run "in front of" workers, and everything else runs "behind", i.e. on subrequests made with fetch()

So yes, Resolve Override should apply to subrequests from inside a worker but even if it indeed works as expected it does not help us:

  1. Resolve Override in Page Rules is for Enterprise customers only
  2. We need to dynamically override DNS resolution from inside the worker

Solving problem 1 is simply about money. The solution to problem 2 lies in updating the Page Rule via the Cloudflare API from the worker right before doing the subrequest fetch. This should work but it's not efficient and we may run into the API rate limit.

We'd love to have a simple, native solution in workers: a request property to specify the remote IP address:

const response = await fetch("https://www.amazon.com/", {
{ cf:
{ setRemoteIp: "99.86.238.192" }
}
})

Basically, something to similar to what the popular command line tool cURL provides with the --resolve option:

curl -svo /dev/null 'https://www.amazon.com/' --resolve www.amazon.com:443:99.86.238.192

8. Even better Workers analytics

The dashboard shows metrics per worker: Requests, CPU Time and Invocation Statuses. We use these metrics mostly to spot problems, for example by keeping an eye on how often the worker exceeded the CPU time limit.

Cloudflare dashboard Workers metrics

We'd like to see Cloudflare improve the dashboard by enabling customers to:

  1. Select a time window larger than 7 days
  2. View invocation statuses over time
  3. View details about errors encountered during execution
  4. View the top 10 most requested URL paths

Update: Cloudflare released Workers Analytics Engine in beta in May 2022 and it looks very promising. Read the announcement blog post or the docs

Workers Analytics Engine is a new way to quickly get analytics about anything using Cloudflare Workers.

9. Even better Workers KV analytics

The dashboard gives little insight into Workers KV usage: Requests, Bandwidth and Storage per account.

We miss three things in the dashboard:

  1. View metrics per namespace
  2. View total number of objects (account wide and per namespace)
  3. View total number of objects in a list

We can get 1 and 2 from the Cloudflare API, but it would be nice to have these numbers in the dashboard as well. Item 3 is about making it much easier to track list growth over time. Some lists can grow to millions of objects and the API returns max 1000 object keys per list request. So, counting the number of objects in a big list requires doing many, many API calls :(

10. Search KV object in dashboard

Sometimes it's just easier to search and view objects in a nice UI instead of on the command line. Currently, the Cloudflare dashboard enables customers to view a list of objects and add an object to storage. We'd like to see Cloudflare add a simple search field: enter the object key (prefixed or in full), hit the search button and get the object back. Easy. Update: Cloudflare now also enables searching in KV by prefix.

Webmentions

2 Likes

Andreas Hansen muji @ hhhypergrowth