Servasm

Minimal x86_64 Linux-only file webserver written in assembly language.

This page is literate program with all service source code. Project repository and build instructions.

Overview

Servasm is a forking server, each request is processed in a separate process. This is how it was done in the Mesozoic Era (except we use sendfile(2), which wasn't invented then). This allows us to make things stupidly simple and leverage the Kernel as much as possible. We aim for approximately 1k lines of assembly with comments and spaces.

Main process sets up listening socket with a few system calls:

socket(2)bind(2)listen(2)

Then the main process loops on the accept(2) system call. For each request, it fork(2)s the main process and processes the request there:

  1. Set alarm(2) to drop very slow clients
  2. recv(2) request headers
  3. Check that the request is valid (only GET requests are supported)
  4. open(2) requested file
  5. Get its size with fstat(2).
  6. write(2) response headers
  7. Let the kernel send the rest with sendfile(2)
  8. close(2) socket and file
  9. exit(2) child

In case of an error, we exit the process by passing the system call result as the exit code.

Reference material

Constants

The data section keeps all static constants that we might need during the server's lifetime.

section .data

We are going to use IPv4 and TCP as our transport.

    pf_inet:     equ 2
    sock_stream: equ 1

Our server binds to 0.0.0.0:8080 interface. 0.0.0.0 is a special IP address that will map to all interfaces on the user's machine.

    sockaddr:    db 0x02, 0x00             ;; AFINET
                 db 0x1f, 0x90             ;; PORT 8080
                 db 0x00, 0x00, 0x00, 0x00 ;; IP 0.0.0.0
    addr_len:    equ 128

Requests timeout in 15 seconds.

    request_timeout:   equ 15

Backlog is the number of incoming requests that the kernel will buffer for us until we accept(2) them. We set it to 128.

    backlog:     equ 128

And we are going to use TCP_CORK option (more on it later)

    sol_tcp:     equ 6
    tcp_cork:    equ 3
    on_state:    db  0x01

We store strings as a pair of their content and their length following right after the message. $ points to the current memory address, so the current address minus the start of the string is its length.

    startup_error_msg:     db "ERROR: Cannot start server", 10
    startup_error_msg_len: equ $ - startup_error_msg

For incoming requests, we restrict the path to be alphanumeric plus ./

    url_whitelist:     db "abcdefghijklmnopqrstuvwxyz"
                       db "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./"
    url_whitelist_len: equ $ - url_whitelist

Lookup tables.

Syscall table for x86-64. For reference, look here.

    sys_write:        equ 1
    sys_open:         equ 2
    sys_close:        equ 3
    sys_fstat:        equ 5
    sys_alarm:        equ 37
    sys_sendfile:     equ 40
    sys_socket:       equ 41
    sys_accept:       equ 43
    sys_recv:         equ 45
    sys_bind:         equ 49
    sys_listen:       equ 50
    sys_setsockopt:   equ 54
    sys_fork:         equ 57
    sys_exit:         equ 60
    sys_waitid:       equ 247

We build response headers on the stack. That means that we need to push strings from the last one, for example to build a header:

HTTP/1.0 200 OK\r\n
Server: servasm\r\n
Content-type: text/html; charset=UTF-8\r\n
Content-Length: 42\r\n

We will push \n\r24 :htgneL-tnetnoC\n\r8-FTU=tesrahc.... To make this easy, we keep pointers to the end of the string instead of the beginning and use a 0x00 byte to mark the beginning of the string.

We use the stack to build headers, so all headers are pushed from the last character to the first one.

\r\n string

                    db 0x00, 13, 10
    crnl:

Response codes

200 OK

                    db 0x00, "HTTP/1.0 200 OK", 13, 10
    result_ok:

403 Forbidden

                    db 0x00, "HTTP/1.0 403 Forbidden", 13, 10
    result_forbidden:

404 Not Found

                    db 0x00, "HTTP/1.0 404 File not found", 13, 10
    result_not_found:

500 Internal Server Error

                    db 0x00, "HTTP/1.0 500 OOPSIE", 13, 10
    result_server_error:

500 Not Implemented

                    db 0x00, "HTTP/1.0 501 Not Implemented", 13, 10
    result_unsupported_method:

Mime Types

We use a small number of predefined mime-types backed in source code. And support only UTF-8 encoding

                    db 0x00, "text/plain; charset=UTF-8", 13, 10
    txt:
                    db 0x00, "text/html; charset=UTF-8", 13, 10
    html:
                    db 0x00, "text/css; charset=UTF-8", 13, 10
    css:
                    db 0x00, "css/js; charset=UTF-8", 13, 10
    js:
                    db 0x00, "image/png", 13, 10
    png:
                    db 0x00, "image/jpeg", 13, 10
    jpg:
                    db 0x00, "application/octet-stream", 13, 10
    other:

Mime type hash table Each entry has two quad words. The first quad word is the product of extension ASCII codes. For example:

104(h) *  116 (t) *  109 (m) * 108 (l) = 142017408 = 0x8770380

This means that some unknown files can be served with the wrong mime-type in the case of a hash collision. And this is okay. Repeat after me: this is okay.

The second quad word is a pointer to the end of the matched mime. In the case the file type is unknown, we serve it with application/octet-stream.

    mime_table:     dq 0x18a380,  txt
                    dq 0x8770380, html
                    dq 0x13fa5b,  css
                    dq 0x2f9e,    js
                    dq 0x135ce0,  png
                    dq 0x12a8a0,  jpg
                    dq 0x0,       other

Headers

Content-type

                    db 0x00, "Content-type: "
    content_type_header:

Content-Length

                    db 0x00, "Content-Length: "
    content_length_header:

Server

                    db 0x00, "Server: servasm", 13, 10,
    server_header:

Variables

BSS section stores data that can be changed during application execution.

section .bss

We will store incoming requests in a buffer limited to 1025 bytes.

    buffer:  resb 1025
    buffer_len:  equ 1024
    buffer_read:  resb 8

Buffer for the result of fstat(2) system call

    statbuf: resb 144

Main server socket

    server_fd: resb 8

Incoming request socket

    client_fd: resb 8

File descriptor to be served

    file_fd:   resb 8

Name of the requested file

    filename: resb 255
    filename_len: resb 8

Size of a file

    file_size: resb 8

Mime type for a file

    mime_type: resb 8

Source code

section .text

Define entry point

    global _start

_start:

Our webserver is little more than glue code to a few syscalls. Actually, it's amazing how much can be done only with standard system calls.

Syscalls are made differently for different versions of architectures and operating systems. We restrict ourselves to the x86_64 architecture. To make a syscall in x86_64, you need to set the rax register to the syscall number and rdi, rsi, rdx, r10, r8, r9 registers to parameters 1-6 respectively. Then, use the syscall instruction to pass control to the kernel. The syscall result will be stored in the rax register. Look here for reference.

Main socket setup

Call socket(2) to create an IPv4 TCP socket

    mov     rax, sys_socket
    mov     rdi, pf_inet
    mov     rsi, sock_stream
    xor     rdx, rdx
    syscall

If the socket was not created and syscall returned an error, jump to exit_error

    cmp rax, 0
    js .exit_error

If everything is fine, we store the result into server_fd

    mov     [server_fd], rax

Call setsockopt(2) to set the TCP_CORK flag to the server socket The TCP_CORK flag will prevent sockets from flushing after we write headers. This will allow us to reduce the number of packets sent, as the first packet will include headers and the first chunk of the served file

For more info, read the blog post or man page

    mov     rax, sys_setsockopt
    mov     rdi, [server_fd]
    mov     rsi, sol_tcp
    mov     rdx, tcp_cork
    mov     r10, on_state
    mov     r8,  8
    syscall
    cmp rax, 0
    js .exit_error

bind(2) to bind socket to IP, port

    mov     rax, sys_bind
    mov     rdi, [server_fd]
    mov     rsi, sockaddr
    mov     rdx, addr_len
    syscall
    cmp rax, 0
    js .exit_error

And call listen(2) to start listening for incoming connections From now on, the kernel will buffer a number of incoming requests equal to backlog If backlog is exceeded, requests will be dropped

    mov     rax, sys_listen
    mov     rdi, [server_fd]
    mov     rsi, backlog
    syscall
    cmp rax, 0
    js .exit_error

Now the socket is initialized and ready to serve clients

Main loop

.accept_socket:

accept(2) new client from backlog Call will block until the first client connects

    mov     rax, sys_accept
    mov     rdi, [server_fd]
    xor     rsi, rsi
    xor     rdx, rdx
    syscall
    cmp rax, 0
    js .exit_error

accept(2) returns fd for the incoming socket

    mov     [client_fd], rax

We process each child in child processes, and when they exit, they become zombie processes. The kernel keeps their exit code and some other state until the parent process gets to it, this is called reaping. We reap all zombie processes before processing each request. This means that we can have some between requests. We use waitid(2) to get the last process exit code.

.next_process:
    mov     rax, sys_waitid
    mov     rdi, 0
    mov     rsi, 0
    mov     rdx, 0
    mov     r10, 4
    mov     r8,  0
    syscall

If the returned value is >0, it means that we reaped a process, and maybe there is more. So we try again. (Errors are ignored here)

    cmp rax, 0
    jg .next_process

We process incoming requests one by one, so we need to return to accept(2)ing requests ASAP. So we fork(2) a new process to handle the client. It will have its own copy of the client fd in the client_fd variable. The main process can overwrite this variable safely, as the client has its own copy.

    mov     rax, sys_fork
    syscall

fork(2) returns a negative number in case of error. If it happens, we ragequit from the server.

    cmp rax, 0
    js  .exit_error

If rax is 0, it means we are inside the child, so we jump to serving the request

    jz  .process_socket

Otherwise, we are in the main process, so we close(2) the client fd and jump to accepting a new client

    mov     rax, sys_close
    mov     rdi, [client_fd]
    syscall
    cmp rax, 0
    js  .exit_error
    jmp .accept_socket

Processing client

.process_socket:

In the child process, we close(2) the server fd

    mov     rax, sys_close
    mov     rdi, [server_fd]
    syscall
    cmp rax, 0
    js  .exit_error

Set alarm(2) to drop slow clients The kernel will send an ALARM signal to the child process after request_timeout has elapsed In the happy path, we will serve the request and exit before the alarm is triggered Otherwise, we just exit the child process

    mov     rax, sys_alarm
    mov     rdi, request_timeout
    syscall
    cmp rax, 0
    js  .exit_error

Parse request

Call recv(2) to write the request to buffer Our buffer size is limited, but we only need to make a few checks and extract the filename from it

    mov     rax, sys_recv
    mov     rdi, [client_fd]
    mov     rsi, buffer
    mov     rdx, buffer_len
    xor     r10, r10
    xor     r8,  r8
    xor     r9,  r9
    syscall
    cmp rax, 0
    js  .exit_error

Our filename extracting algorithm requires that the buffer ends with " "

    mov byte [buffer + 1 + rax], " "

Keep bytes read count

    mov [buffer_read], rax

For now, we accept only GET requests So we will return a 501 error to clients if another request method is used in the request

    mov rax, result_unsupported_method
    cmp byte [buffer],     "G"
    jnz  .return_error
    cmp byte [buffer + 1], "E"
    jnz  .return_error
    cmp byte [buffer + 2], "T"
    jnz  .return_error
    cmp byte [buffer + 3], " "
    jnz  .return_error
    cmp byte [buffer + 4], "/"
    jnz  .return_error

Call extract_filename procedure to extract the filename to filename variable

    call extract_filename

check_filenames returns 0 if the filename is valid, returns 403 otherwise

    call check_filename
    cmp rax, 0
    mov rax, result_forbidden
    jne .return_error

Call get_mime to extract the mime-type from filename It will set the mime_type variable

    call get_mime

Try to open(2) the requested file and put fd to file_fd variable

    mov     rax, sys_open
    mov     rdi, filename
    xor     rsi, rsi ;; no flags
    xor     rdx, rdx ;; readonly
    syscall
    mov [file_fd], rax

Return 404 if opening the file fails

    cmp rax, 0
    mov rax, result_not_found
    js  .return_error

Call fstat(2) to get the file info structure and extract file_size from it

    mov     rax, sys_fstat
    mov     rdi, [file_fd]
    mov     rsi, statbuf
    syscall
    cmp rax, 0
    mov rax, result_server_error
    js  .return_error
    mov rax, [statbuf + 48]
    mov [file_size], rax

Write response

After the request has been parsed and the file found, we start writing the response

    .write_response:

Read the request from the socket

    call read_full_request

Write headers with write_headers procedure

    call write_headers
    cmp rax, 0
    js .exit_error

We use sendfile(2) to make the Kernel read data from file_fd and write it to client_fd We expect sendfile(2) to send the whole file at once

    mov     rax, sys_sendfile
    mov     rdi, [client_fd]
    mov     rsi, [file_fd]
    xor     rdx, rdx
    mov     r10, [file_size]
    syscall ;; ignore errors

close(2) client socket

    mov     rax, sys_close
    mov     rdi, [client_fd]
    syscall ;; ignore errors

And close(2) file_fd

    mov     rax, sys_close
    mov     rdi, [file_fd]
    syscall ;; ignore errors

And finally exit(2) from the child process with a 0 exit code

    xor rax, rax
    jmp .exit

Error handling

.return_error:

Write error response headers and body to client socket

    call write_error_response

And close(2) the client socket ignoring errors

    mov     rax, sys_close
    mov     rdi, [client_fd]
    syscall

.exit_error:

write(2) the error message to stderr

    mov     rax, sys_write
    mov     rdi, 2              ; stderr
    mov     rsi, startup_error_msg
    mov     rdx, startup_error_msg_len
    syscall

Set error code to 1

    mov     rax, 1

.exit:

Call the exit(2) syscall

    mov    rdi, rax
    mov    rax, sys_exit
    syscall

Procedures

Extract Mime Type

We use filename and filename_len to fill the mime_type variable. It will point to the end of the mime string

get_mime:
    mov rax, 1
    mov rcx, [filename_len]
    dec rcx

Calculate mime_hash using the algorithm in Mime Types section

.get_mime_hash:
    xor rdx, rdx
    mov dl, [filename + rcx]
    cmp dl, "."
    je .get_mime_hash_done
    mul rdx
    dec rcx
    cmp rcx, 0
    je .get_mime_hash_done
    jmp .get_mime_hash

.get_mime_hash_done:
    mov rcx, 0

Find the pointer to the Mime Type using mime_table

.get_mime_get_pointer:
    mov r11, [mime_table + rcx]
    cmp r11, rax
    je .get_mime_pointer_done
    cmp r11, 0
    je .get_mime_pointer_done
    add rcx, 16
    jmp .get_mime_get_pointer
.get_mime_pointer_done:
    mov rdi, [mime_table + rcx + 8]

And store it to mime_type variable

    mov [mime_type], rdi
    ret

Write headers

Write a 200 OK response and some headers to the client socket

write_headers:

We will be using the stack as a buffer for response headers instead of making multiple write calls on the socket

Save the stack top to a temporary register

    mov rbp, rsp

push_string uses rcx to keep count of free bytes in the current stack top, -1 means no free bytes left and we need to make room for a new value

    mov rcx, -1

First, we push the end of headers (\r\n\r\n)

    mov rsi, crnl
    call push_string
    mov rsi, crnl
    call push_string

Push the Content-Length header

    mov rax, [file_size]
    call push_int
    mov rsi, content_length_header
    call push_string

Push the Content-type header

    mov rsi, [mime_type]
    call push_string
    mov rsi, content_type_header
    call push_string

Push the server name (Server header)

    mov rsi, server_header
    call push_string

Push the 200 OK response header

    mov rsi, result_ok
    call push_string

Calculate the start headers address on the stack

    mov rbx, rcx
    add rbx, rsp
    inc rbx

Restore stack state

    mov rsp, rbp

Calculate the length of headers

    sub     rbp, rbx

write(2) the headers

    mov     rax, sys_write
    mov     rdi, [client_fd]
    mov     rsi, rbx
    mov     rdx, rbp
    syscall

    ret

Write error response

Write response headers and body to the client fd expects rax to point to the end of the error response code string

write_error_response:
    mov r11, rax

Read request from socket

    call read_full_request

Look at the write_headers method for comments on using push_string

Write the end of the request

    mov rbp, rsp
    mov rcx, -1
    mov rsi, crnl
    call push_string

Write the request body

    mov rsi, r11
    call push_string

Write the body | headers separator

    mov rsi, crnl
    call push_string

Write the request header

    mov rsi, r11
    call push_string

Calculate the start headers address on the stack

    mov rbx, rcx
    add rbx, rsp
    inc rbx

Restore stack state

    mov rsp, rbp

write(2) the request from the stack to the client socket

    mov     rax, sys_write
    mov     rdi, [client_fd]
    mov     rsi, rbx
    sub     rbp, rbx
    dec rbp
    mov     rdx, rbp
    syscall ;; ignore errors

    ret

push string

rsi should point to the end of the string, the string should begin with a 0x00 byte rcx is used to store the byte shift from the stack top (0-7), if rcx is -1 it means that additional stack space is required. The function will grow the stack

If push_string is called multiple times it will form a continuous string on the stack For example, two calls with rcx -1 0x00, "llo" and 0x00, "he" will push "hello" to the stack and set rcx to 2

push_string:

Remove the return address from the stack and store it in the rdx register

    pop   rdx

We use 0x00 to mark the beginning of the passed string

    mov al, 0x00

.push_string_next:

If we have no free bytes on the stack add 8 bytes and change rcx accordingly

    cmp rcx, -1
    jne .push_string_write
    push 0
    mov rcx, 7

.push_string_write:

Move the string to the stack starting from the string end until 0x00

    dec rsi
    mov rbx, [rsi]
    cmp al, bl
    je .push_string_ret
    mov byte [rsp + rcx], bl
    dec rcx
    jmp   .push_string_next

.push_string_ret:

Restore the stack

    push  rdx
    ret

Push int

Converts rax to a string and calls push_string on it

push_int:

Remove the return address from the stack and store it in the rdi register

    pop rdi

We convert the integer value to a sequence of characters with base 10 and push each character with the push_string procedure

    mov r8, rax
.push_int_next:
    mov rax, r8
    xor  rdx, rdx
    mov r11, 10
    div r11
    mov r8, rax
    add dl, 48
    mov rsi, rsp
    sub rsi, 8
    mov byte [rsi - 1], dl
    mov byte [rsi - 2], 0x00
    call push_string
    cmp r8, 0
    je .push_int_ret
    jmp .push_int_next
.push_int_ret:

Restore the stack

    push rdi
    ret

Read rest of request

Spec requires us to read the full request with headers before we can send a response

read_full_request:

We kept the amount read from the socket in the buffer_read variable

    mov rax, [buffer_read]

We check that the last bytes received from the client were \r\n\r\n

    .check_buffer:
    cmp byte [buffer + rax - 1], 10
    jne .read_more_from_client_socket
    cmp byte [buffer + rax - 2], 13
    jne .read_more_from_client_socket
    cmp byte [buffer + rax - 3], 10
    jne .read_more_from_client_socket
    cmp byte [buffer + rax - 4], 13
    jne .read_more_from_client_socket
    ret

If not, we recv(2) more data from the socket and check the buffer again in a loop

    .read_more_from_client_socket:
    mov     rax, sys_recv
    mov     rdi, [client_fd]
    mov     rsi, buffer
    mov     rdx, buffer_len
    xor     r10, r10
    xor     r8,  r8
    xor     r9,  r9
    syscall
    jmp .check_buffer

Extract filename

Fills the filename and filename_len variables based on the request buffer content

extract_filename:

We expect only GET requests in the buffer, so the filename should start with the fifth character, after the GET / string

    mov rsi, buffer + 5
    mov rdi, filename
    xor rcx, rcx

We copy characters from the buffer until we see a '?' or ' ' character

.extract_filename_next_char:
    cld
    cmp byte [rsi], " "
    jz .extract_filename_check_index
    cmp byte [rsi], "?"
    jz .extract_filename_check_index
    movsb
    jmp .extract_filename_next_char

If the filename is empty (client requested /), we set filename to be index.html

.extract_filename_check_index:
    mov rcx, rdi
    sub rcx, filename
    cmp rcx, 0
    jnz .extract_filename_done
    mov rax, "index.ht"
    mov [filename    ], rax
    mov rax, "ml"
    mov [filename + 8], rax
    mov rcx, 10

.extract_filename_done:
    mov [filename_len], rcx
    ret

Check filename

Checks that the filename is safe to read(2) from the filesystem

check_filename:
    mov rsi, -1

First, check that filename characters match the whitelist

.check_filename_whitelist:
    inc rsi
    mov byte al, [filename + rsi]
    cmp rsi, [filename_len]
    jz .check_filename_whitelist_ok
    mov rdi, url_whitelist
    mov rcx,  url_whitelist_len
    repne scasb
    je .check_filename_whitelist
    jmp .check_filename_return_error

.check_filename_whitelist_ok:
    mov rcx, [filename_len]

First, check that the filename doesn't contain ".." in it

.check_filename_double_dot:
    dec rcx
    cmp word [filename + rcx], ".."
    je .check_filename_return_error
    cmp rcx, 0
    je .check_filename_return_success
    jmp .check_filename_double_dot

.check_filename_return_success:
    xor rax, rax
    ret

.check_filename_return_error:
    mov rax, 1
    ret

Known issues

  • We use temporary registers to store some global state between procedure calls This makes recursion impossible and can lead to hidden bugs The natural way to solve this is to use the stack for keeping state between procedure calls, but we use the stack to build the response string
  • While simple, forking on each request is not optimal for performance Modern webservers use epoll(2) to process multiple requests in a single process

License

Copyright (c) 2015 Vladimir Terekhov

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE