No Hair Blog

Relayd as reverse proxy: Whitelisting

Note: This post is tested on OpenBSD 6.6 current.

Using relayd as a reverse proxy is doable with blcklists but these can get long and unwieldy. Whitelists are more restrictive and can (at times) be simpler, particularly for my use case which is a web site consisting of static html pages. Usually, though, some mixture of blacklists and whitelists is needed.

Consider the follow http protocol stanza for relayd.conf:

http protocol https_revproxy {

#       # TCP performance options 
        tcp {nodelay, sack, socket buffer 65536, backlog 100 }

#       # Return HTTP/HTML error pages
        return error

#       # Add connection data to log
        match header log "Host"
        match header log "X-Forwarded-For"
        match header log "User-Agent"
        match header log "Referer"
        match url log

#       Adjust headers to pass info to httpd log
        match request header set "X-Forwarded-For" value "$REMOTE_ADDR"
        match request header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"

#       # Change timeout
        match header set "Keep-Alive" value "$TIMEOUT"

        tls keypair "www.example.net"
        tls keypair "www.example.com"
        tls { no tlsv1.0, ciphers "HIGH" }

#       # Anonymize our webserver's name/type
        match response header set  "Server" value "Microsoft IIS 9 beta 1" 

##      # Blacklists

#       # Block bots and other user agent strings
        block request quick header "User-Agent" value "*Ahrefs*"
        block request quick header "User-Agent" value "*Semrush*"
        block request quick header "User-Agent" value "*Yandex*"
        block request quick header "User-Agent" value "*seznam*"
        block request quick header "User-Agent" value "*MJ12*"

#       # Block all queries
        block request quick query "*" value "*"

##      # Whitelists

#       # Tag all requests
        match request tag "all"

#       # Pass those using http method GET
        match request method "GET" tag "get"
        block request quick tagged "all"

#       # Pass traffic to virtual hosts only
        match url "example.net/" tag "allow"
        match url "example.com/" tag "allow"
        block request quick tagged "get"
		
#       # Forward valid request paths to webserver
        pass request quick path "/old/*.html" forward to <webserver>
        pass request quick path "/old/*.css" forward to <webserver>
        pass request quick path "/old/" forward to <webserver>
        pass request quick path "/images/*.gif" forward to <webserver>
        pass request quick path "/images/*.jpg" forward to <webserver>
        pass request quick path "/*.html" forward to <webserver>
        pass request quick path "/*.css" forward to <webserver>
        pass request quick path "/*.txt" forward to <webserver>
        pass request quick path "/*.gif" forward to <webserver>
        pass request quick path "/*.jpg" forward to <webserver>
        pass request quick path "/" forward to <webserver>

#       # Block all requests not matched above
        block request

        }

The 'Blacklists' include filtering by http header fields. Whitelists for these would need to be encyclopedic. Filtering abusive bots is the main role here. There is also a blanket ban on html queries.

The 'whitelist' for http request method is simpler than blocking all the other methods. The whitelist of url strings is to only allow requests to the virtual hosts, not ip addresses or other malformed requests.

Relayd apparently uses a relatively restricted shell globbing engine (see glob(7)). The code

pass request quick path "/*.gif" forward to <webserver>
pass request quick path "/*.jpg" forward to <webserver>

works, whereas

pass request quick path "/*.{gif, jpg}" forward to <webserver>

does not. So, whitelisting is still a bit of an ugly hack, but functional.


Posted by Gordon, No Hair Blog, April 12, 2020

© nohair.net and the author

For comments, corrections, and addenda, email: gordon[AT]nohair.net

Blog | Entries | Tags | Home