IntroductionI wanted somehow to start my adventure with exploiting. First I went to corelan.be where you can find many fantastic tutorials. However they were too difficult to me to be fully understood. And then I found securitytube with its "Securitytube Linux Assembly Expert" (SLAE). Although after learning all the materials available I feel rather a beginner than an expert, the course in quite coherent way introduces assembler and shellcoding.
The best motivation is to put some money in a challenge, therefore I have bought SLAE certification, and to get certified I have to complete some coding tasks. Each task requires me to put its solution on Github and to write a blog post describing my solution. The first task is to write a bind shell.
PseudocodeThe shellcode is just a simple program that creates a socket, puts it in listening state and then runs a shell that a connecting client can interact with. The 'pseudocode' is a simple overview of the program containing the logical blocks. I did not write a C program in the begining, however after all I regret that decision and further tasks will be solved with first writing a C code
create socket;Example C code can be found e.g. on thegeekstuff.com, although my code is a bit less complicated.
prepare structure for binding;
bind socket to a port;
put socket in listening state;
accept incoming connections on socket;
redirect std(in/out/err) to socket;
For this and the next syscalls you may encounter, if you want to read some details, try running manual, e.g. man socket, or if that will show some irrelevant results, try the next 'page', man 2 accept.
But how to call socket? Let's check in manual (man socket):
int socket(int domain, int type, int protocol);Domain is the protocol type used. In the same manual page you can see, that for the IPv4 it is AF_INET.
Type defines "communication semantics" what simply means that it is higher layer protocol. As we are interested in TCP, SOCK_STREAM would be our choice.
Protocol is generally not used because there is mostly theoretical possibility that there may be more than one protocol within domain. For normal use, 0 is the correct value.
The returned value is a socket file descriptor (in unix everything is a 'file').
Therefore our call is as follows:
sock_file_descriptor = socket(AF_INET, SOCK_STREAM, 0)The constants used may be found in /usr/include/i386-linux-gnu/bits/socket.h and their respective values are: 2 and 1. So our call to socket() should be: sockfd = socket(2, 1, 0);
Going back to the code, there are three xors. These instructions puts 0 in the particular registry without having the "0" value in the resulting code, what is very important, as the resulting code cannot have any NULL bytes.
Later, the 0 value is pushed onto the stack. Just after that, eax value is set to 102 (the syscall number of socket()) what will be needed further. After setting eax, ebx and ecx are set to 1 and 2 and pushed onto stack. Stack then contains
Stack pointer is saved in the ecx registry and the syscall is being executed through the int call. The call would then run syscall with number in eax with arguments in ebx and ecx. The ebx is interesting because it contains value "1" which is re-used both in pushing to stack and in calling socket - SYS_SOCKET in file /usr/include/linux/net.h is defined with value 1. Ecx contains the address to the stack and poping off the stack all the arguments will put them in correct order: 2, 1 and 0.
The result of calling int 0x80 is the socket file descriptor and is is being put in eax registry. As that registry will be used heavily, that value is saved in the esi registry.
Bind() call requires a special structure of data as argument, read man 2 bind. The value of INADDR_ANY taken from /usr/include/netinet/in.h is 0 - we want to accept any connection from client. We are going to push it onto stack and then the port number. Therefore we push first 0. Then the 16-bit port number is being put in the cx registry. Since the structure requires the port to be the result of htons(port), in the next instruction the xchg is being called to swap bytes. After all, ecx is being pushed onto stack.
the protocol (AF_INET) family number is being pushed so the stack looks as follows:
Having the data prepared, it is time to call the bind() syscall:
Again, all the registries need to be initiated with proper values:
eax - 102, the number of socket() syscall
ebx - 2, the number of bind() call in /usr/include/linux/net.h
edx - 16, the size of the structure generated before
Then comes the 'magic': edx, ecx and esi are pushed onto stack. The stack pointer is copied to ecx because ecx should contain pointer to arguments of bind() call. And these arguments are the file descriptor, the connection structure and the size of connection structure. The last thing is obvious - having initiated everything we are calling the syscall for bind().
Putting socket in listening state
This part is fortunately very short :-) Again we are initiating eax with the socket() syscall number but now we are calling listen(), represented by the number 4 in `/usr/include/linux/net.h`. Listen() call requires two parameters: the socket file descriptor and something called "backlog", which can be set to 0 and forgotten. Those two values are pushed onto the stack and set in ecx as the pointer to arguments. Finally the syscall is being executed
We are almost done with socket manipulating. The last call is for accept(), which has number 5 in /usr/include/linux/net.h. Accept() call requires three parameters described in man 2 accept`:
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
Since second and third parameters are not used, they may be 0 and the call simplifies itself.
Redirect all the basic streams to socketHaving the socket handler open and ready, you need to remember to redirect all the I/O streams to it to allow the shell to be fully interactive. Dup2 is the syscall that does the stream duplication.
First we need to save the result of accept() call, we do that in ebx registry. Then we set ecx to 3 which is the loop counter as we want to run the loop three times.
In loop we first set the dup2 syscall number in eax and then we call the syscall. After all we decrement the loopcounter and test if we hadn't reached -1. If so, we do not the loop again and we go to executing the shell.
First call of dup2() will have parameters ebx,ecx and in c it would look like dup2(clientfd, 3), so now file descriptor number 3 is clientfd
Second call of dup2() would look like dup2(clientfd, 2), so clientfd now replaced stderr
Second call of dup2() would look like dup2(clientfd, 1), so clientfd now replaced stdout
Second call of dup2() would look like dup2(clientfd, 0), so clientfd now replaced stdin
So - clientfd is saved in descriptor 3 and everything will be redirected to the socket represented by clientfd.
The first thing in this block of code, just after initialising eax with a proper syscall number, is to put some strange fixed values onto the stack. In the code these values are commented with strings, which read backwards would result in NULL, 'bash', 'bin/' and '////. Putting all the vaules together gives us written backwards path to the executable: '////bin/bash'
Why backwards? Because we need them on the stack and this is the simplest way.
You can put any executable name here and getting that 'strange' numbers (representing ASCII values of the string characters) is possible with the following script. To ease such a task, I have created a simple python tool reverse.py.
What is important, is that the length of the executable should be a multitude of 4. Therefore, if you have a string of e.g. 9 bytes length, you must somehow extend it to 12 or 16 etc. bytes.
Taking "/bin/bash" as example, you may add additional '/' characters resulting in "////bin/bash" which is 12 bytes long string. To have the proper hex values just run `python reverse.py "////bin/bash"`
Wow - that was a bit long explanation but you now should understand where from those 'magic values' came out.
After pushing the string onto a stack, we store the pointer to the string in ebx.
Then we must store in registries the final arguments of execve call:
edx - envp - NULL, we do not want to pass anything to bash
ecx - argv - pointer to the structure containing address of the executable and NULL.
ebx - executable - pointer to the executable's name
And finally the syscall :-)
Basic usageOK, so cool assembly, but what next?
I have created some scripts to ease the process of compiling and testing the shellcode. They are described in README.md on Github, however the one 'fire and forget' is go.sh. Running it with assembly file as argument will compile and link assembler code plus create and compile a simple c program to test the shellcode (some kind of integration test). In case of this bind_shell.nasm, the syntax would be: ./go.sh bind_shell
I hope that if you managed to get here, you understand the shellcode logic and now you can create your own bind shell for Linux.
Mandatory footer to get certified:
Student ID: SLAE-769