| 1 | SYSCALLS |
| 2 | |
| 3 | on linux/i386, the machine code puts the arguments of a syscall in the |
| 4 | registers AX, BX, CX, DX, DI, SI and makes a soft interrupt 0x80. |
| 5 | |
| 6 | as the plan9 kernel doesnt care about the interrupt vector 0x80 it |
| 7 | sends a note to the process that traped and if not handled kills it. |
| 8 | in a note handler, it is possible to access the machine state of the |
| 9 | process when the trap/interrupt happend from the ureg argument. |
| 10 | |
| 11 | in linuxemu, we install a note handler that checks if the trap was a |
| 12 | linux syscall and call our handler function from our systab. |
| 13 | |
| 14 | after our syscall handler returned, we move the program counter |
| 15 | in the machine state structure after the int 0x80 instruction and |
| 16 | continue execution by accepting the note as handled with a call to |
| 17 | noted(NCONT). |
| 18 | |
| 19 | todo automatic conversion to a plan9 function call the number of |
| 20 | arguments and the function name of the handler must be known. this |
| 21 | information is provided by the linuxcalltab input file that is feed trough |
| 22 | linuxcalltab.awk to build neccesary tables. |
| 23 | |
| 24 | the linux specific syscall handling and argument conversion done in |
| 25 | linuxcall.c only. the idea is to later add support for other syscall |
| 26 | personalities like bsd without having to change the handler code. |
| 27 | |
| 28 | |
| 29 | MEMORY |
| 30 | |
| 31 | unlike shared libraries wich are position independent, binaries have to be |
| 32 | loaded to a fixed address location. (elf supports position independent |
| 33 | programs that can be loaded everywhere, but its not used on i386) |
| 34 | |
| 35 | the emulator doesnt need to load and relocate shared libraries itself. this is |
| 36 | done my the runtime linker (/lib/ld-linux.so). it just needs to load |
| 37 | the binary and the runtime linker to ther prefered location and jump into |
| 38 | the entry point. then the runtime linker will parse the elf sections of the |
| 39 | binary and call mmap to load further shared libraries. |
| 40 | |
| 41 | the first thing we need is an implementation of mmap that allows us |
| 42 | to copy files to fixed addresses into memory. to do that on plan9, |
| 43 | segments are used. |
| 44 | |
| 45 | its is not possible to create a segment for every memory mapping |
| 46 | because plan9 limits the number of segments per process to a small |
| 47 | number. instead we create a fixed number of segments and |
| 48 | expand/shrink them on demand. the linux stack area is fixed size and |
| 49 | uses the fact thet plan9 doesnt allocate physical memory until pages |
| 50 | are touched. |
| 51 | |
| 52 | here are 3 segments created for a linux process: |
| 53 | |
| 54 | "private" is used for all MAP_PRIVATE mappings and can be shared if |
| 55 | processes run in same address space. code, data and files is mapped there. |
| 56 | |
| 57 | "shared" for shared memory mappings. |
| 58 | |
| 59 | "stack" is like "private", but lives just below the plan9 stack segment. |
| 60 | this is needed because glibc expands the stack down by mmap() pages |
| 61 | below the current stack area. we cannot use the plan9 stack segment |
| 62 | because that segment is copied on rfork and is never shared between |
| 63 | processes. |
| 64 | |
| 65 | the data structures of the emulator itself ("kernel memory") need to |
| 66 | be shared for all processes even if the linux process runs in its own |
| 67 | private address space, so the plan9 Bss and Data segments are made |
| 68 | shared on startup by copying the contents of the original segment into a |
| 69 | temporary file, segdetach() it and segattach() a new shared segments |
| 70 | on the same place and copy the data back in from the file. |
| 71 | |
| 72 | with this memory layout, it is possible for the linux process to damage |
| 73 | data structures in the emulator. but we seem to be lucky for now :) |
| 74 | |
| 75 | |
| 76 | USER PROCESSES (UPROCS) |
| 77 | |
| 78 | linuxemu does not switch ans schedule linux processes itself. every user |
| 79 | process has its own plan9 process. memory sharing semantics is translated |
| 80 | to rfork flags on fork/clone. |
| 81 | |
| 82 | we have a global process table of Uproc structures to track states and |
| 83 | resources for all user processes: |
| 84 | |
| 85 | fs: filesystem mount table |
| 86 | fdtab: the filedescriptor table |
| 87 | mem: memory mappings |
| 88 | signal: signal handler and queue |
| 89 | trace: debug trace buffer |
| 90 | |
| 91 | resources that can be shared are reference counted and get freed when |
| 92 | the last process referencing them exits. |
| 93 | |
| 94 | |
| 95 | KERNEL PROCESSES (KPROCS) |
| 96 | |
| 97 | if we needs to defer work or do asynchronous i/o it can spawn a |
| 98 | kernel process with kprocfork. kernel processes dont have a Uproc |
| 99 | structure associated and have the userspace memory segments detached |
| 100 | therfor cant access userspace memory. |
| 101 | |
| 102 | bufprocs and timers are implemented with kernel processes. |
| 103 | |
| 104 | |
| 105 | DEVICES |
| 106 | |
| 107 | ealier versions mapped linux files directly to plan9 files. this made |
| 108 | the implementation of ioctls, symlinks, remove on close, and |
| 109 | select/poll hard and also had problems with implementing fork sharing |
| 110 | semantics. |
| 111 | |
| 112 | current linuxemu does it all by itself. here is a global device table |
| 113 | of Udev structures. devices can implement all i/o related syscalls by |
| 114 | providing a function pointer in ther Udev. when a device has to deal |
| 115 | with asynchronous io on real plan9 files it uses bufprocs. |
| 116 | |
| 117 | |