linux_emul_base/doc/linuxemu.txt

   1 SYSCALLS
   2
   3 on linux/i386, the machine code puts the arguments of a syscall in the
   4 registers AX, BX, CX, DX, DI, SI and makes a soft interrupt 0x80.
   5
   6 as the plan9 kernel doesnt care about the interrupt vector 0x80 it
   7 sends a note to the process that traped and if not handled kills it.
   8 in a note handler, it is possible to access the machine state of the
   9 process when the trap/interrupt happend from the ureg argument.
  10
  11 in linuxemu, we install a note handler that checks if the trap was a
  12 linux syscall and call our handler function from our systab.
  13
  14 after our syscall handler returned, we move the program counter
  15 in the machine state structure after the int 0x80 instruction and
  16 continue execution by accepting the note as handled with a call to
  17 noted(NCONT).
  18
  19 todo automatic conversion to a plan9 function call the number of
  20 arguments and the function name of the handler must be known.  this
  21 information is provided by the linuxcalltab input file that is feed trough
  22 linuxcalltab.awk to build neccesary tables.
  23
  24 the linux specific syscall handling and argument conversion done in
  25 linuxcall.c only.  the idea is to later add support for other syscall
  26 personalities like bsd without having to change the handler code.
  27
  28
  29 MEMORY
  30
  31 unlike shared libraries wich are position independent, binaries have to be
  32 loaded to a fixed address location. (elf supports position independent
  33 programs that can be loaded everywhere, but its not used on i386)
  34
  35 the emulator doesnt need to load and relocate shared libraries itself. this is
  36 done my the runtime linker (/lib/ld-linux.so). it just needs to load
  37 the binary and the runtime linker to ther prefered location and jump into
  38 the entry point. then the runtime linker will parse the elf sections of the
  39 binary and call mmap to load further shared libraries.
  40
  41 the first thing we need is an implementation of mmap that allows us
  42 to copy files to fixed addresses into memory. to do that on plan9,
  43 segments are used.
  44
  45 its is not possible to create a segment for every memory mapping
  46 because plan9 limits the number of segments per process to a small
  47 number.  instead we create a fixed number of segments and
  48 expand/shrink them on demand.  the linux stack area is fixed size and
  49 uses the fact thet plan9 doesnt allocate physical memory until pages
  50 are touched.
  51
  52 here are 3 segments created for a linux process:
  53
  54 "private" is used for all MAP_PRIVATE mappings and can be shared if
  55 processes run in same address space. code, data and files is mapped there.
  56
  57 "shared" for shared memory mappings.
  58
  59 "stack" is like "private", but lives just below the plan9 stack segment.
  60 this is needed because glibc expands the stack down by mmap() pages
  61 below the current stack area. we cannot use the plan9 stack segment
  62 because that segment is copied on rfork and is never shared between
  63 processes.
  64
  65 the data structures of the emulator itself ("kernel memory") need to
  66 be shared for all processes even if the linux process runs in its own
  67 private address space, so the plan9 Bss and Data segments are made
  68 shared on startup by copying the contents of the original segment into a
  69 temporary file, segdetach() it and segattach() a new shared segments
  70 on the same place and copy the data back in from the file.
  71
  72 with this memory layout, it is possible for the linux process to damage
  73 data structures in the emulator. but we seem to be lucky for now :)
  74
  75
  76 USER PROCESSES (UPROCS)
  77
  78 linuxemu does not switch ans schedule linux processes itself. every user
  79 process has its own plan9 process. memory sharing semantics is translated
  80 to rfork flags on fork/clone.
  81
  82 we have a global process table of Uproc structures to track states and
  83 resources for all user processes:
  84
  85 fs: filesystem mount table
  86 fdtab: the filedescriptor table
  87 mem: memory mappings
  88 signal: signal handler and queue
  89 trace: debug trace buffer
  90
  91 resources that can be shared are reference counted and get freed when
  92 the last process referencing them exits.
  93
  94
  95 KERNEL PROCESSES (KPROCS)
  96
  97 if we needs to defer work or do asynchronous i/o it can spawn a
  98 kernel process with kprocfork. kernel processes dont have a Uproc
  99 structure associated and have the userspace memory segments detached
 100 therfor cant access userspace memory.
 101
 102 bufprocs and timers are implemented with kernel processes.
 103
 104
 105 DEVICES
 106
 107 ealier versions mapped linux files directly to plan9 files.  this made
 108 the implementation of ioctls, symlinks, remove on close, and
 109 select/poll hard and also had problems with implementing fork sharing
 110 semantics.
 111
 112 current linuxemu does it all by itself.  here is a global device table
 113 of Udev structures.  devices can implement all i/o related syscalls by
 114 providing a function pointer in ther Udev.  when a device has to deal
 115 with asynchronous io on real plan9 files it uses bufprocs.
 116
 117