Servasm |
|
|
Minimal x86_64 Linux-only file webserver written in assembly language. This page is literate program with all service source code. Project repository and build instructions. |
|
OverviewServasm is a forking server, each request is processed in a separate process.
This is how it was done in the Mesozoic Era (except we use Main process sets up listening socket with a few system calls:
Then the main process loops on the
In case of an error, we exit the process by passing the system call result as the exit code. |
|
Reference material
|
|
ConstantsThe data section keeps all static constants that we might need during the server's lifetime. |
section .data |
|
We are going to use IPv4 and TCP as our transport. |
pf_inet: equ 2
sock_stream: equ 1 |
|
Our server binds to |
sockaddr: db 0x02, 0x00 ;; AFINET
db 0x1f, 0x90 ;; PORT 8080
db 0x00, 0x00, 0x00, 0x00 ;; IP 0.0.0.0
addr_len: equ 128 |
|
Requests timeout in 15 seconds. |
request_timeout: equ 15 |
|
Backlog is the number of incoming requests that the kernel will buffer for us until we |
backlog: equ 128 |
|
And we are going to use |
sol_tcp: equ 6
tcp_cork: equ 3
on_state: db 0x01 |
|
We store strings as a pair of their content and their length following right after the message.
|
startup_error_msg: db "ERROR: Cannot start server", 10
startup_error_msg_len: equ $ - startup_error_msg |
|
For incoming requests, we restrict the path to be alphanumeric plus |
url_whitelist: db "abcdefghijklmnopqrstuvwxyz"
db "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./"
url_whitelist_len: equ $ - url_whitelist |
Lookup tables. |
|
|
Syscall table for x86-64. For reference, look here. |
sys_write: equ 1
sys_open: equ 2
sys_close: equ 3
sys_fstat: equ 5
sys_alarm: equ 37
sys_sendfile: equ 40
sys_socket: equ 41
sys_accept: equ 43
sys_recv: equ 45
sys_bind: equ 49
sys_listen: equ 50
sys_setsockopt: equ 54
sys_fork: equ 57
sys_exit: equ 60
sys_waitid: equ 247 |
|
We build response headers on the stack. That means that we need to push strings from the last one, for example to build a header:
We will push |
|
|
We use the stack to build headers, so all headers are pushed from the last character to the first one. |
|
|
|
db 0x00, 13, 10
crnl: |
Response codes |
|
db 0x00, "HTTP/1.0 200 OK", 13, 10
result_ok: |
|
db 0x00, "HTTP/1.0 403 Forbidden", 13, 10
result_forbidden: |
|
db 0x00, "HTTP/1.0 404 File not found", 13, 10
result_not_found: |
|
db 0x00, "HTTP/1.0 500 OOPSIE", 13, 10
result_server_error: |
|
db 0x00, "HTTP/1.0 501 Not Implemented", 13, 10
result_unsupported_method: |
|
Mime Types |
|
|
We use a small number of predefined mime-types backed in source code. And support only UTF-8 encoding |
db 0x00, "text/plain; charset=UTF-8", 13, 10
txt:
db 0x00, "text/html; charset=UTF-8", 13, 10
html:
db 0x00, "text/css; charset=UTF-8", 13, 10
css:
db 0x00, "css/js; charset=UTF-8", 13, 10
js:
db 0x00, "image/png", 13, 10
png:
db 0x00, "image/jpeg", 13, 10
jpg:
db 0x00, "application/octet-stream", 13, 10
other: |
|
Mime type hash table Each entry has two quad words. The first quad word is the product of extension ASCII codes. For example:
This means that some unknown files can be served with the wrong mime-type in the case of a hash collision. And this is okay. Repeat after me: this is okay. The second quad word is a pointer to the end of the matched mime.
In the case the file type is unknown, we serve it with |
mime_table: dq 0x18a380, txt
dq 0x8770380, html
dq 0x13fa5b, css
dq 0x2f9e, js
dq 0x135ce0, png
dq 0x12a8a0, jpg
dq 0x0, other |
Headers |
|
db 0x00, "Content-type: "
content_type_header: |
|
db 0x00, "Content-Length: "
content_length_header: |
|
db 0x00, "Server: servasm", 13, 10,
server_header: |
|
Variables |
|
|
|
section .bss |
|
We will store incoming requests in a buffer limited to 1025 bytes. |
buffer: resb 1025
buffer_len: equ 1024
buffer_read: resb 8 |
|
Buffer for the result of |
statbuf: resb 144 |
|
Main server socket |
server_fd: resb 8 |
|
Incoming request socket |
client_fd: resb 8 |
|
File descriptor to be served |
file_fd: resb 8 |
|
Name of the requested file |
filename: resb 255
filename_len: resb 8 |
|
Size of a file |
file_size: resb 8 |
|
Mime type for a file |
mime_type: resb 8 |
Source code |
section .text |
|
Define entry point |
global _start
_start: |
|
Our webserver is little more than glue code to a few syscalls. Actually, it's amazing how much can be done only with standard system calls. Syscalls are made differently for different versions of architectures and operating systems. We restrict ourselves to the |
|
Main socket setup |
|
|
Call |
mov rax, sys_socket
mov rdi, pf_inet
mov rsi, sock_stream
xor rdx, rdx
syscall |
|
If the socket was not created and syscall returned an error, jump to exit_error |
cmp rax, 0
js .exit_error |
|
If everything is fine, we store the result into |
mov [server_fd], rax |
|
Call |
mov rax, sys_setsockopt
mov rdi, [server_fd]
mov rsi, sol_tcp
mov rdx, tcp_cork
mov r10, on_state
mov r8, 8
syscall
cmp rax, 0
js .exit_error |
|
|
mov rax, sys_bind
mov rdi, [server_fd]
mov rsi, sockaddr
mov rdx, addr_len
syscall
cmp rax, 0
js .exit_error |
|
And call |
mov rax, sys_listen
mov rdi, [server_fd]
mov rsi, backlog
syscall
cmp rax, 0
js .exit_error |
|
Now the socket is initialized and ready to serve clients |
|
Main loop |
.accept_socket: |
|
|
mov rax, sys_accept
mov rdi, [server_fd]
xor rsi, rsi
xor rdx, rdx
syscall
cmp rax, 0
js .exit_error |
|
accept(2) returns fd for the incoming socket |
mov [client_fd], rax |
|
We process each child in child processes, and when they exit, they become zombie processes.
The kernel keeps their exit code and some other state until the parent process gets to it, this is called |
.next_process:
mov rax, sys_waitid
mov rdi, 0
mov rsi, 0
mov rdx, 0
mov r10, 4
mov r8, 0
syscall |
|
If the returned value is >0, it means that we reaped a process, and maybe there is more. So we try again. (Errors are ignored here) |
cmp rax, 0
jg .next_process |
|
We process incoming requests one by one, so we need to return to |
mov rax, sys_fork
syscall |
|
|
cmp rax, 0
js .exit_error |
|
If |
jz .process_socket |
|
Otherwise, we are in the main process, so we close(2) the client fd and jump to accepting a new client |
mov rax, sys_close
mov rdi, [client_fd]
syscall
cmp rax, 0
js .exit_error
jmp .accept_socket |
Processing client |
.process_socket: |
|
In the child process, we |
mov rax, sys_close
mov rdi, [server_fd]
syscall
cmp rax, 0
js .exit_error |
|
Set alarm(2) to drop slow clients
The kernel will send an |
mov rax, sys_alarm
mov rdi, request_timeout
syscall
cmp rax, 0
js .exit_error |
Parse request |
|
|
Call |
mov rax, sys_recv
mov rdi, [client_fd]
mov rsi, buffer
mov rdx, buffer_len
xor r10, r10
xor r8, r8
xor r9, r9
syscall
cmp rax, 0
js .exit_error |
|
Our filename extracting algorithm requires that the buffer ends with |
mov byte [buffer + 1 + rax], " " |
|
Keep bytes read count |
mov [buffer_read], rax |
|
For now, we accept only GET requests So we will return a 501 error to clients if another request method is used in the request |
mov rax, result_unsupported_method
cmp byte [buffer], "G"
jnz .return_error
cmp byte [buffer + 1], "E"
jnz .return_error
cmp byte [buffer + 2], "T"
jnz .return_error
cmp byte [buffer + 3], " "
jnz .return_error
cmp byte [buffer + 4], "/"
jnz .return_error |
|
Call |
call extract_filename |
|
|
call check_filename
cmp rax, 0
mov rax, result_forbidden
jne .return_error |
|
Call |
call get_mime |
|
Try to |
mov rax, sys_open
mov rdi, filename
xor rsi, rsi ;; no flags
xor rdx, rdx ;; readonly
syscall
mov [file_fd], rax |
|
Return 404 if opening the file fails |
cmp rax, 0
mov rax, result_not_found
js .return_error |
|
Call |
mov rax, sys_fstat
mov rdi, [file_fd]
mov rsi, statbuf
syscall
cmp rax, 0
mov rax, result_server_error
js .return_error
mov rax, [statbuf + 48]
mov [file_size], rax |
Write responseAfter the request has been parsed and the file found, we start writing the response |
.write_response: |
|
Read the request from the socket |
call read_full_request |
|
Write headers with |
call write_headers
cmp rax, 0
js .exit_error |
|
We use |
mov rax, sys_sendfile
mov rdi, [client_fd]
mov rsi, [file_fd]
xor rdx, rdx
mov r10, [file_size]
syscall ;; ignore errors |
|
|
mov rax, sys_close
mov rdi, [client_fd]
syscall ;; ignore errors |
|
And |
mov rax, sys_close
mov rdi, [file_fd]
syscall ;; ignore errors |
|
And finally |
xor rax, rax
jmp .exit |
Error handling |
.return_error: |
|
Write error response headers and body to client socket |
call write_error_response |
|
And |
mov rax, sys_close
mov rdi, [client_fd]
syscall
.exit_error: |
|
|
mov rax, sys_write
mov rdi, 2 ; stderr
mov rsi, startup_error_msg
mov rdx, startup_error_msg_len
syscall |
|
Set error code to 1 |
mov rax, 1
.exit: |
|
Call the |
mov rdi, rax
mov rax, sys_exit
syscall |
Procedures |
|
Extract Mime Type |
|
|
We use |
get_mime:
mov rax, 1
mov rcx, [filename_len]
dec rcx |
|
Calculate mime_hash using the algorithm in Mime Types section |
.get_mime_hash:
xor rdx, rdx
mov dl, [filename + rcx]
cmp dl, "."
je .get_mime_hash_done
mul rdx
dec rcx
cmp rcx, 0
je .get_mime_hash_done
jmp .get_mime_hash
.get_mime_hash_done:
mov rcx, 0 |
|
Find the pointer to the Mime Type using |
.get_mime_get_pointer:
mov r11, [mime_table + rcx]
cmp r11, rax
je .get_mime_pointer_done
cmp r11, 0
je .get_mime_pointer_done
add rcx, 16
jmp .get_mime_get_pointer
.get_mime_pointer_done:
mov rdi, [mime_table + rcx + 8] |
|
And store it to |
mov [mime_type], rdi
ret |
Write headersWrite a 200 OK response and some headers to the client socket |
write_headers: |
|
We will be using the stack as a buffer for response headers instead of making multiple write calls on the socket |
|
|
Save the stack top to a temporary register |
mov rbp, rsp |
|
|
mov rcx, -1 |
|
First, we push the end of headers ( |
mov rsi, crnl
call push_string
mov rsi, crnl
call push_string |
|
Push the |
mov rax, [file_size]
call push_int
mov rsi, content_length_header
call push_string |
|
Push the |
mov rsi, [mime_type]
call push_string
mov rsi, content_type_header
call push_string |
|
Push the server name ( |
mov rsi, server_header
call push_string |
|
Push the |
mov rsi, result_ok
call push_string |
|
Calculate the start headers address on the stack |
mov rbx, rcx
add rbx, rsp
inc rbx |
|
Restore stack state |
mov rsp, rbp |
|
Calculate the length of headers |
sub rbp, rbx |
|
write(2) the headers |
mov rax, sys_write
mov rdi, [client_fd]
mov rsi, rbx
mov rdx, rbp
syscall
ret |
Write error responseWrite response headers and body to the client fd expects rax to point to the end of the error response code string |
write_error_response:
mov r11, rax |
|
Read request from socket |
call read_full_request |
|
Look at the |
|
|
Write the end of the request |
mov rbp, rsp
mov rcx, -1
mov rsi, crnl
call push_string |
|
Write the request body |
mov rsi, r11
call push_string |
|
Write the body | headers separator |
mov rsi, crnl
call push_string |
|
Write the request header |
mov rsi, r11
call push_string |
|
Calculate the start headers address on the stack |
mov rbx, rcx
add rbx, rsp
inc rbx |
|
Restore stack state |
mov rsp, rbp |
|
|
mov rax, sys_write
mov rdi, [client_fd]
mov rsi, rbx
sub rbp, rbx
dec rbp
mov rdx, rbp
syscall ;; ignore errors
ret |
push string
If push_string is called multiple times it will form a continuous string on the stack
For example, two calls with rcx -1 |
push_string: |
|
Remove the return address from the stack
and store it in the |
pop rdx |
|
We use |
mov al, 0x00
.push_string_next: |
|
If we have no free bytes on the stack
add 8 bytes and change |
cmp rcx, -1
jne .push_string_write
push 0
mov rcx, 7
.push_string_write: |
|
Move the string to the stack starting from the string end until |
dec rsi
mov rbx, [rsi]
cmp al, bl
je .push_string_ret
mov byte [rsp + rcx], bl
dec rcx
jmp .push_string_next
.push_string_ret: |
|
Restore the stack |
push rdx
ret |
Push intConverts rax to a string and calls push_string on it |
push_int: |
|
Remove the return address from the stack
and store it in the |
pop rdi |
|
We convert the integer value to a sequence of characters with base 10 and push each character with the |
mov r8, rax
.push_int_next:
mov rax, r8
xor rdx, rdx
mov r11, 10
div r11
mov r8, rax
add dl, 48
mov rsi, rsp
sub rsi, 8
mov byte [rsi - 1], dl
mov byte [rsi - 2], 0x00
call push_string
cmp r8, 0
je .push_int_ret
jmp .push_int_next
.push_int_ret: |
|
Restore the stack |
push rdi
ret |
Read rest of requestSpec requires us to read the full request with headers before we can send a response |
read_full_request: |
|
We kept the amount read from the socket in the |
mov rax, [buffer_read] |
|
We check that the last bytes received from the client were |
.check_buffer:
cmp byte [buffer + rax - 1], 10
jne .read_more_from_client_socket
cmp byte [buffer + rax - 2], 13
jne .read_more_from_client_socket
cmp byte [buffer + rax - 3], 10
jne .read_more_from_client_socket
cmp byte [buffer + rax - 4], 13
jne .read_more_from_client_socket
ret |
|
If not, we |
.read_more_from_client_socket:
mov rax, sys_recv
mov rdi, [client_fd]
mov rsi, buffer
mov rdx, buffer_len
xor r10, r10
xor r8, r8
xor r9, r9
syscall
jmp .check_buffer |
Extract filenameFills the filename and filename_len variables based on the request buffer content |
extract_filename: |
|
We expect only GET requests in the buffer, so the filename should start with the fifth character, after the |
mov rsi, buffer + 5
mov rdi, filename
xor rcx, rcx |
|
We copy characters from the buffer until we see a |
.extract_filename_next_char:
cld
cmp byte [rsi], " "
jz .extract_filename_check_index
cmp byte [rsi], "?"
jz .extract_filename_check_index
movsb
jmp .extract_filename_next_char |
|
If the filename is empty (client requested |
.extract_filename_check_index:
mov rcx, rdi
sub rcx, filename
cmp rcx, 0
jnz .extract_filename_done
mov rax, "index.ht"
mov [filename ], rax
mov rax, "ml"
mov [filename + 8], rax
mov rcx, 10
.extract_filename_done:
mov [filename_len], rcx
ret |
Check filenameChecks that the filename is safe to |
check_filename:
mov rsi, -1 |
|
First, check that |
.check_filename_whitelist:
inc rsi
mov byte al, [filename + rsi]
cmp rsi, [filename_len]
jz .check_filename_whitelist_ok
mov rdi, url_whitelist
mov rcx, url_whitelist_len
repne scasb
je .check_filename_whitelist
jmp .check_filename_return_error
.check_filename_whitelist_ok:
mov rcx, [filename_len] |
|
First, check that the filename doesn't contain |
.check_filename_double_dot:
dec rcx
cmp word [filename + rcx], ".."
je .check_filename_return_error
cmp rcx, 0
je .check_filename_return_success
jmp .check_filename_double_dot
.check_filename_return_success:
xor rax, rax
ret
.check_filename_return_error:
mov rax, 1
ret |
Known issues
|
|
LicenseCopyright (c) 2015 Vladimir Terekhov Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE |
|