Raphael S.Carvalho's Programming Blog

raphael.scarv@gmail.com

"A programmer that cannot debug effectively is blind."

Friday, December 27, 2013

ZFS Adjustable Replacement Cache (ARC) Analysis on OSv

Find further info about OSv at: http://osv.io

:: Overview ::
- The size of the ZFS ARC is allowed to grow more than the target size, but arc_reclaim_thread
must eventually wake up to reduce the size to the target one.
I initially thought that it wouldn't be working as the commit '29cf134' partially disabled its 
functionality, however, running 'osv zfs' later shows that the arc size is really reduced to 
conform the target.

- The ARC initial target should be initialized to 1/8 of all memory, then be reduced on 
eventual memory pressures (Tomek has already touched this, and Glommer suggested a similar 
approach). arc_init() gets the system memory through kmemsize() which currently always return 0,
thus arc itself is coming up with a number when setting the target size (16MB).

- Another important detail is l2arc (level2 ARC) currently disabled which means performance 
penalty depending on the workload.

:: For memory pressure ::
- By knowing that arc_reclaim_thread() is working and rely on arc_adjust() to resize the lists, 
we know that arc_shrink() would work on OSv.

arc_shrink() reduces the arc target size by doing the following:
* Firstly, to_free is calculated as follow: to_free = arc_target_size >> arc_shrink_shift (5)
* Then it will guarantee that to_free will not reduce target_size to a value lower than the 
minimum target size.
  If the condition above is true, then target size is reduced by to_free (which means reducing the 
arc size by about 3.125%).
  If not, target size is set to the minimum target size.
* And finally, arc_adjust() is called to do the actual work.

:: ZFS ARC performance on Cassandra ::
- The results below show that the ARC misses ratios are really high on both cases. ZFS ARC is 
performing well on small workloads, but when it comes to higher ones, the performance isn't 
the same.

[raphaelsc@muninn bin]$ ./cassandra-stress -d 192.168.122.89 -n 10000000
total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time
225940,22594,22594,1.5,3.0,33.2,10
512367,28642,28642,1.5,2.5,69.4,20
762547,25018,25018,1.5,2.6,93.7,30
1029819,26727,26727,1.5,2.5,93.7,40
1269269,23945,23945,1.5,2.7,93.4,50

(gdb) osv zfs
:: ZFS TUNABLES ::
    zil_replay_disable=0
    zfs_nocacheflush=0
    zfs_prefetch_disable=0
:: ARC SIZES ::
    Actual ARC Size: 64839968
    Target size of ARC: 16777216
    Min Target size of ARC: 16777216
    Max Target size of ARC: 16777216
    Target size of MRU: 15728640
:: ARC EFFICIENCY ::
Total ARC accesses: 63962
    ARC hits: 51622 (80.71%)
        ARC MRU hits: 18842 (36.50%)
            Ghost Hits: 1811
        ARC MFU hits: 32306 (62.58%)
            Ghost Hits: 970
    ARC misses: 12340 (19.29%)
:: L2ARC ::
    Actual L2ARC Size: 0
Total L2ARC accesses: 0
    L2ARC hits: 0 (nan%)
    L2ARC misses: 0 (nan%)

[raphaelsc@muninn bin]$ ./cassandra-stress -d 192.168.122.89 -n 10000000
total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time
208736,20873,20873,1.7,3.8,27.0,10
424091,21535,21535,1.7,3.5,102.4,20
624038,19994,19994,1.7,3.6,102.4,30
871778,24774,24774,1.7,3.4,76.9,40
1048259,17648,17648,1.6,3.2,111.4,50
1307851,25959,25959,1.6,3.1,76.9,60
1564253,25640,25640,1.6,3.0,571.1,70
1814642,25038,25038,1.6,2.8,74.7,80
2066720,25207,25207,1.6,2.8,40.1,91
2264887,19816,19816,1.6,2.9,40.0,101

(gdb) osv zfs
:: ZFS TUNABLES ::
    zil_replay_disable=0
    zfs_nocacheflush=0
    zfs_prefetch_disable=0
:: ARC SIZES ::
    Actual ARC Size: 143722352
    Target size of ARC: 16777216
    Min Target size of ARC: 16777216
    Max Target size of ARC: 16777216
    Target size of MRU: 15728640
:: ARC EFFICIENCY ::
Total ARC accesses: 226173
    ARC hits: 158569 (70.11%)
        ARC MRU hits: 54671 (34.48%)
            Ghost Hits: 6017
        ARC MFU hits: 85117 (53.68%)
            Ghost Hits: 3033
    ARC misses: 67604 (29.89%)
:: L2ARC ::
    Actual L2ARC Size: 0
Total L2ARC accesses: 0
    L2ARC hits: 0 (nan%)
    L2ARC misses: 0 (nan%)

Monday, October 21, 2013

Booting and File systems

Arch: x86
Firmware: BIOS
Partitioning scheme: MBR-based
Platform: Linux
Bootloader: Syslinux-based

BIOS comes to live when the computer is switched on.
BIOS seeks a bootsector living in the first sector of the respective disk.
The last two bytes of the first sector must match the boot signature.
If found, BIOS loads the first sector of the disk into the main memory and starts executing it.

The first sector will go through each entry in the partition table, then attempt to find the bootable partition.
If found, then it will load the first sector of the partition given the CHS address in the respective entry.

This first sector is the bootloader. It's usually hardcoded so that it knows the location of the second stage.
The second stage has file system drivers, modules, etc.
The second stage will determine which file system driver should be used to this partition based on the magic number in the superblock.
Each file system driver has such a discovery function.
When a driver is found, then its file system operations structure is hooked to the mount point (partition).

Bootloaders usually have implicit absolute paths for configuration files, e.g. /boot/config.
Such files inform the location of the kernel image and initramfs.

Kernel has its own file system drivers embedded in its image.
When mounting devices, it will go through the list of available drivers and perform the following with each entry:
- Check if the driver supports the file system installed in the device.
- If so, hook this driver to the device.
- If not, go to the next entry.

After that, all operations on such device (mount point) will use the operations provided by the driver (previously hooked at the mount time).

Saturday, September 28, 2013

Does 'processor's size' limit the size of main memory?

Processor doesn't limit the size of main memory, but instead the addressable space.

"Does it mean something else also?"
It means a lot of things.
- Register size.
- Addressing.
- Data Alignment.

On x86-32 as an example, ESP (Stack Pointer) is a 32 bits long register that is used to point to the top of the stack. So it will be used implicitly by x86 when doing stack-related operations (PUSH, POP). Registers such as ESP work basically as an offset into the addressable space.

Nowadays, addresses issued by your program go through a translation process in MMU (On computers provided with MMU), but I will not to discuss this here as it has nothing to do with the main purpose of this topic. It's possible to have an address bus whose size is lower/higher than the size of the processor (Maybe, PAE rely on having a higher number of addressing lines =]).

Yes, address bus must be at least compatible with the size of the word of the processor. Otherwise, how would we send all bits of the address to the memory controller on load/store operations?

x86 real-mode is an interesting example of having an addressable space higher than the size of the word of the processor.
At that time, you had segment:offset addresses where segment would be multiplied by 16 (shift << 4), then the result would be added to the offset. This generated address would then be sent to the memory controller through the address bus. Even though real mode processors had at most registers of 16 bits, the address bus was made of 20 addressing lines.
Up to 1 megabyte of physical memory could be accessed.

The following sentence will probably help you:
"A microprocessor will typically have a number of addressing lines equal to the base-two logarithm of its physical addressing space" http://en.wikipedia.org/wiki/A20_line

There is an interesting approach used by compilers when certain operations aren't natively supported by the underlying processor. For example, 32-bit processors don't support 64-bit data values, but some compilers circumvent that by emulating 64-bit operations (load/store, arithmetic, branch).

Suppose we will run the following snippet of code under a 32-bit processor:

 long long int a, b; // 64-bit values (Even on 32-bit processors).  
 a += b; // Add a to b; store the result into a.

How would it be possible if 32-bit processors cannot operate on data whose size is higher than 32 bits?
As I said above, it will emulate such operations. It does that by using multiple instructions (steps).
Yes, it will be slower, but that's the only way of dealing with data higher than that supported by the processor.

On a 32-bit processor, if you're adding one 64-bit value to another one, then the addition must be done partially as 64-bit values aren't supported by 32-bit processors.

The assembly code respective to the above snippet would look something like the following:

 # eax:ebx will be used to store a.  
 # ecx:edx will be used to store b.  
 add ebx, edx; # we must calculate ebx:edx first (storing the result into ebx)  
 # note that adc will be used instead of add;  
 # there may be a carry left over by the previous addition,  
 # so the next addition must take it into account.  
 adc eax, ecx; # then calculate eax:ecx (storing the result into eax)

Yeah, it's expensive (from both resource and performance standpoint since many general-purpose registers are being used, and multiple steps are required to get the operation done respectivelly) and boring (personal opinion =P), but nevertheless, how would we do it otherwise?

Hope it will help you,
Raphael S. Carvalho.

Friday, September 6, 2013

x86 security mechanisms

Hi folks,

Unless the kernel of your Operating System uses the "follow the bouncing kernel" scheme, kernel and user environment share the same address space. So the following doubt may arises: If the user is sharing its address space with the kernel, which security policies/mechanisms are used to ensure safety? That's what I will be addressing here, so fasten your seat belt :)

MMU (Memory Management Unit) uses both paging and segmentation as mechanisms of protection.
I will focus on paging here, so if you want to understand how paging relates with segmentation, take a look at:

http://pdos.csail.mit.edu/6.828/2011/lec/x86_translation_and_registers.pdf

MMU performs checking based on the CPL (Current Privilege Level)[1].

Each physical page has a correspondent page entry that tells the hardware which operations are allowed and who can access it.

When user programs are being executed, the CPL is always 3, thus pages marked as system-only aren't available. There are a lot of flags stored in each 'page descriptor', but I will present the two ones more relevant due to the purpose of this post.

[1]: CPL is an essential feature to the protection mechanism of the x86.

It prevents processes running with a lower-privileged level from doing things that would halt the entire system.

By the way, CPL is stored in the bottom two bits of the CS register.

Write and Present are flags that determine the current state of a page. If the write flag is turned on, then write operations are allowed. Otherwise, the page is read-only.
The present flag basically tells MMU whether or not the underlying page is present in physical memory. If it is not, the page was probably swapped to some other place due to some event, e.g. out of memory.

When the system traps into the kernel-level e.g. syscall, the processor itself switches the mode from user to system (from CPL 3 to 0). From this moment on, pages marked as system-only are available.

By knowing that, we readily understand that processes running under CPL 3 (user programs) will never be able to access the kernel address space. Unless there is some exploitable code in the kernel itself, user space will not be able to peek into the kernel space.

Hope you enjoyed it,

Raphael S. Carvalho.

Saturday, August 31, 2013

Endianess

Look carefully at the following snippet of code:

int c = 0xFFAABBCC;
printf("%02x\n", ((char *)&c)[0]);

If you aren't familiar, then you may be asking yourself: How does it work?

Each hexadecimal digit represents a nibble, that is, 4 bits. Then 2 hexadecimal digits = 1 byte. 'int c' stores a 32-bit/4-bytes value.
It's also important mentioning that '0x' is prepended to all hexadecimal values in the C language.

Computer memory is basically a bunch of sequenced 8-bits cells{1}, then it's not possible to store all the bytes from that value into a single cell. Why? integer stores a multi-byte value, and so must span several memory cells.
Unfortunately, there are some architectures that store numbers in different ways.

* {1}: This may not be true in the real world! Google about NUMA systems.

x86 is a little-endian architecture, which means that less-significant bytes are stored first.
Do you understand the meaning of most-significant byte and less-significant byte at the following hexadecimal value: 0xFFAABBCC?
It's just a terminology to describe significance respective to each byte of a multi-byte value.
0xFF is the most-significant, whereas 0xCC the less-significant one.

So answer me the following, which byte from the variable c will be stored first in memory? the most-significant or the less-significant?
If you understood the content above, you know that it depends on the underlying arch.

On a little-endian arch, the bytes from 0xFFAABBCC will be stored in memory as follow (On a big-endian arch, 0xFF (the most-significant byte) would be stored first instead):

[0] = 0xCC
[1] = 0xBB
[2] = 0xAA
[3] = 0xFF

* [0] meaning that it's a lower address than [3].
* An example of big-endian arch is PPC.

- So let's get started figuring out what each piece of the code means:

This is the first step of the code: ((char *)&c)
It basically gets the address from an integer variable, then we have a pointer to an integer. Thereafter, it converts the integer pointer into a character one by using an explicit cast.
It means that it's a pointer to a 8-bits value from now on.

((char *)&c)[0]: The second step will basically gets the value pointed to by the character pointer.
As I told you, less-significant bytes are stored first on little-endian archs, then 0xCC is the output. If it were [1] instead, then 0xBB would be output since [1] references the second less-significant byte.
You can see how the bytes were individually stored in memory by looking at my description above.

If you got the content from this post, then you won't have any troubles in creating a code to check the endianess of your machine. If you're feeling adventurous, take it as an exercise =)

Hope you liked it,
Raphael S. Carvalho.

Saturday, July 20, 2013

* X86 PROTECTION * [ 1 / ??? ]

This is the first tutorial of a series about the protection mechanisms on x86 arch.

* This text was mostly based on the Intel 80386 Programmer's Reference Manual.
NOTE: If something in this article is unclear to you, consult the manual itself.
NOTE: This article explains the protection mechanisms, if you aren't familiar with the address formation cycle (segmentation->paging), then you should go back and learn it before reading this article.
NOTE: Sentences surrounded by double quotes are implicitly explanations taken from the Intel Manual.

Today I will talk about the protection criteria on x86. Firstly, it's important to know that protection is essential to confine bugs.
It means that intended/unitended bugs from one procedure shouldn't damage others.
Protection on x86 was built with two important goals in mind: help detect and identify bugs. Bugs must be found and eliminated as quickly as possible.

Proctection on x86 was made possible through the following mechanisms:
- Check memory accesses.
- Check instruction execution.

It's also important to know that the system developer has the power of deciding which mechanisms will be used (according to the system design objectives).

Protection Overview:
-----
The Intel manual lists all aspects of protection as applied both to segmentation and paging.

Aspects of protection:
1. Type checking
2. Limit checking
3. Restriction of addressable domain
4. Restriction of procedure entry points
5. Restriction of instruction set

It's interesting knowing that the protection hardware of x86 is an integral part of the memory management hardware.
When a memory reference is made, this hardware must check that it conforms the protection criteria as defined by the system programmer.
Any memory access that doesn't conform the criteria results in an exception, which must be handled by the exception mechanism.
Invalid memory access will prevent the cycle from starting, thus protecting the system. You can ask yourself, this protection mechanism wouldn't result in a performance penalty since there is a check on every memory access?
The answer is no, both the address formation and the check on memory access are performed concurrently.
NOTE: I won't talk about exception handling in this article. If you're interested, then you should go forward and learn it yourself.

It's also important to clarify the meaning of privilege when applied to aspects of protection.

"The concept of "privilege" is central to several aspects of protection (numbers 3, 4, and 5 in the preceeding list)."
"Applied to procedures, privilege is the degree to which the procedure can be trusted not to make a mistake that might affect other procedures or data. Applied to data, privilege is the degree of protection that a data structure should have from less trusted procedures."

It's very easy to understand, then additional note shouldn't be needed. I will finish this protection overview topic by emphasizing that this concept of privelege applies both to segment and page protection.

Segment-level protection
-----
It's essential to know that checks are performed automatically by the CPU whenever you load a segment descriptor into a segment register, and also with every segment access.
Segment descriptors simply stores info about a segment. "Segment registers hold the protection parameters of the currently addressable segments."

Besides, I suppose you know the difference between segmentation on real and protected mode. I won't present the differences in this article.

Descriptors and Protection Parameters
-----
All of these fields are set by the system programmer at the time the descriptor is created. As stated in the manual, the application programmer shouldn't be concerned with these details unless needed (E.G: better understanding or exploitation issues).

It's also very interesting that each segment register has an invisible portion for storing several parameters about the segment. The processor loads not only the base address, but also protection information (limit, type, and privilege level).
So the CPU can look at this invisible portion (on subsequent memory checks) instead of retrieving the protection parameters from the descriptor on every segment access.
"Therefore, subsequent protection checks on the same segment do not consume additional clock cycles."

Hope you liked it,
Raphael S. Carvalho

Wednesday, January 23, 2013

Hacking over and over...

Geek quote: "The size of a word is not a universal standard."

I loved the following papers and I'm sure you'll also like them.
Reading Recomendations...
- (Excellent document) Eric S. Raymond: How to become a hacker: http://www.catb.org/esr/faqs/hacker-howto.html
- (Read carefully) Peter Norvig: Teach Yourself Programming in Ten Years: http://norvig.com/21-days.html

I also would like to share the way I solved the exercise 5 of MIT OS course...
Exercise 5. Trace through the first few instructions of the boot loader again and identify the first instruction that would "break" or otherwise do the wrong thing if you were to get the boot loader's link address wrong. Then change the link address in boot/Makefrag to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don't forget to change the link address back and make clean again afterward!

Firstly, follow some details that may give you a better understanding of this exercise.
The Load address of a section is the memory address at which that section should be loaded into memory.
The Link address of a section is the memory address from which the section expects to execute.

As I said earlier, the BIOS goes through the list of available devices and performs the following action with each entry:
- Checks whether the device is bootable, and if so, loads the bootloader code into memory[1] and starts executing it.

[1]: The code of the bootloader is implicitly loaded at the physical address 0x7c00.
-----

I traced through the bootloader code, and I was wondering whether unconditional/conditional jumps would work properly or not.
However, I got the answer. These instructions use relative addressing (base + offset) instead of absolute addressing.
So regardless of position code, any jump instruction will always work in silence.

Follow some jump instructions (Both of them are conditional jumps):
    7c0e:    75 fa                    jne    7c0a <seta20.1>
    ....
    7c18:    75 fa                    jne    7c14 <seta20.2>

Now is time to change the link address and see what will happens... The link address is specified by the makefile of bootloader, so we only need to replace the current address.

$(V)$(LD) $(LDFLAGS) -N -e start -Ttext 0x8C00 -o $@.out $^

After cleaning and compiling the source code with the new parameter, let's take a look at the first instruction to see what happened.

00008c00 <start>:
    8c00:    fa                       cli
    8c01:    fc                       cld

Wow, it looks like our code will be loaded into another address of memory, though it's not the truth.
BIOS always load the boot sector into memory at physical addresses 0x7c00, so instructions that depend on a specific address will not work as we expected. Trust and believe me, that's really bad.

Brace yourself for some bad news, the LGDT[1] instruction uses absolute addressing instead. Given this information, the bootloader won't work properly if the link address is wrong.

lgdt    gdtdesc

LGDT takes as argument the address of a structure containing both the base address and limit of the GDT[2].

gdtdesc:
.word   0x17                            # sizeof(gdt) - 1
.long   gdt                             # address gdt

In protected mode, selector values are interpreted as a index into the GDT. So if the GDT wasn't loaded or defined correctly, the processor will raises an exception (triple fault).

[1]: LGDT is used to load the base address and the limit of the GDT. Basically, it turns the GDT on.
[2]: GDT (Global Descriptor Table) works as a lookup table. It's used by the x86 processors which support protected mode.
It contains entries telling the CPU about memory segments. (Borrowed from http://wiki.osdev.org/)
-----

Att, Raphael S.Carvalho

Monday, January 21, 2013

MIT OS course - 6828 (Lab 1)

Follow a list of solved exercises below.
They provide a basic understand of how JOS kernel boots.

Exercise 1.
The first exercise was to familiarize ourselves with the Assembly language.
I found available materials on the 6.828 reference page. Btw, I'm still reading the PCASM book.

Follow the link of the book if you're interested:
http://www.drpaulcarter.com/pcasm/

-----
Exercise 2.
Use GDB's si (Step Instruction) command to trace into the ROM BIOS for a few more instructions, and try to guess what it might be doing.

; This is the first instruction that BIOS executes when PC starts off.
; It does a far jump. CS: 0xf000 : IP: 0xe05b = 0xfe05b (Segmented address)
{1}: [f000:fff0] 0xffff0:    ljmp   $0xf000,$0xe05b

; Another jump
{2}: [f000:e05b] 0xfe05b:    jmp    0xfc85e

; MOV cr0 content to eax
{3}: [f000:c85e] 0xfc85e:    mov    %cr0,%eax

; Bitwise AND with the content of eax (cr0)
; It will disable two bits (CPU features)
{4}: [f000:c861] 0xfc861:    and    $0x9fffffff,%eax

; MOV eax content to cr0
{5}: [f000:c867] 0xfc867:    mov    %eax,%cr0

; Clean interrupt and direction flag
{6}: [f000:c86a] 0xfc86a:    cli
{7}: [f000:c86b] 0xfc86b:    cld

; MOV 0x8f (1000 1111) to eax
{8}: [f000:c86c] 0xfc86c:    mov    $0x8f,%eax

* [It looks the following commands set up the CMOS Ram Flag]
; 0070    w    CMOS RAM index register port (ISA, EISA)
;         bit 7    = 1 NMI disabled
;                     = 0 NMI enabled
;        bit 6-0      CMOS RAM index (64 bytes, sometimes 128 bytes)
; As I'm seeing, first we need to set the address we want to read from
; or write to.
{9}: [f000:c872] 0xfc872:    out    %al,$0x70

; 0071    r/w    CMOS RAM data port (ISA, EISA)
;         any write to 0070 should be followed by an action to 0071
;        or the RTC wil be left in an unknown state.
; Read from or write to the address we set in the previous command.
{10}: [f000:c874] 0xfc874:    in     $0x71,%al
======

; Compare what we read to zero.
{11}: [f000:c876] 0xfc876:    cmp    $0x0,%al

; Jump to that address if the value is not equals.
{12}: [f000:c878] 0xfc878:    jne    0xfc88d

* [Setting up the segment registers]
; Clean ax value.
{13}: [f000:c87a] 0xfc87a:    xor    %ax,%ax

; Set stack segment to zero.
{14}: [f000:c87c] 0xfc87c:    mov    %ax,%ss

; Set stack pointer to 0x7000
{15}: [f000:c87e] 0xfc87e:    mov    $0x7000,%esp
======

; (gdb) i r
; edx            0xf4b2c    1002284
{16}: [f000:c884] 0xfc884:    mov    $0xf4b2c,%edx

{17}: [f000:c88a] 0xfc88a:    jmp    0xfc719

; Clean ecx value
{18}: [f000:c719] 0xfc719:    mov    %eax,%ecx

; Clean interrupt and direction flag again
{19}: [f000:c71c] 0xfc71c:    cli
{20}: [f000:c71d] 0xfc71d:    cld

Btw, there is no need to figure out all the details - just the general idea of what the BIOS is doing first.
The BIOS is also responsible for:
- checking if there is enough memory to keep going;
- setting up an interrupt descriptor table;
- initializing various devices such as VGA display;
- searching for a bootable device.

When the BIOS finds a bootable device it loads the boot loader from the disk to the physical address range 0x7c00-0x7dff,
and then uses a jmp instruction to set the CS:IP to 0000:7c00. Consequently, passing control to the boot loader.
If no bootable disk was found, it shows up a warning message explaining what happened. Probably: "No bootable device was found."
If the disk is bootable, the first sector is called the boot sector.

-----
Exercise 3. Take a look at the lab tools guide, especially the section on GDB commands. Even if you're familiar with GDB, this includes some esoteric GDB commands that are useful for OS work.

0x7cf1:    cmp    %edi,%ebx ; while (pa < end_pa) {

0x7cf7:    call   0x7c81 ; readsect function address

(gdb) c
Continuing.
0x7d67:    call   *0x10018 ; Entry point of the kernel

- At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?
; Load a descriptor table that contains information for translating segmented addresses.
; Translation of segmented address into physical address happens differently in protected mode.
lgdt    gdtdesc

; The snippet of code below enables the protected mode flag of the control register, so here is where the switching happens.
movl    %cr0, %eax
orl     $CR0_PE_ON, %eax
movl    %eax, %cr0

- What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?
If everything works fine, then the last instruction would be the following:
0x7d67:    call   *0x10018 ; Entry point of the kernel
This instruction calls the kernel by using the entry point stored in the elf header.
Follow the first instruction of the kernel: 0x10000c:    movw   $0x1234,0x472

- Where is the first instruction of the kernel?
It was answered in the last question: 0x10000c:    movw   $0x1234,0x472

- How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?
First of all, it loads an arbitrary number of sectors, specifically five sectors. By doing that, it assumes that the Elf header of the kernel image will be loaded at all in the memory. Follow the arbitrary address of the Elf header: #define ELFHDR        ((struct Elf *) 0x10000).
After loading that amount of sectors, it check whether the Elf signature is consistent or not, if so the bootloader will load each program header in a pre-defined memory address (All required informations about a program header are stored in its respective structure).
Consequently, the kernel image will be loaded successfully by the bootloader. The bootloader can jump into the kernel code by looking at the entry point stored in the ELF header.

Att, Raphael S.C

Thursday, January 17, 2013

The Diary of an Operating System Developer

Day0:
Today, I spent most of my time working on the shell exercise provided by MIT operating system course.
Before I get started, I read the chapter 0 of the xv6 book which teaches the major concepts for implementing new features, such as Pipeline, I/O redirection and Background processes.
I didn't need to implement the shell from the ground up because the course I enrolled provides a skeleton shell.
So my work was to read the snippets of code so that I could understand how the shell actually works.
After studying the source code, I realized that the shell uses an incredible algorithm for the parser (through the use of recursivity), and by knowing that, I felt engaged to complete this exercise properly.

Day1:
I didn't do too much today, however, I started reading LAB1 from MIT and compiled a custom QEMU (provides new features) which will be used massivelly throughout the OS course.
I hope I'll be learning many things while studying it, and as I said few weeks earlier, I'll be sharing nice stuffs as I gain knowledge.

The 2011 MIT OS course provides videos, well, not good image quality, but I have been learning a lot watching them.
Finally, I found out what I want to do until the last day of my life, and I wouldn't face this challenge if I'm not going to do the best in-class homework.

Today, I was also talking with a friend at #GCC (IRC channel hosted by Freenode Server), and the following doubt arises: I didn't understand the meaning of the below sentence.
"It's even smart enough to know that if you tell it to put (x+1) in a register, then if you don't clobber it, and later C code refers to (x+1), and it was able to keep that register free, it will reuse the computation."

After much thinking and throwing my own head against the wall (I'm kidding =]), I got its actual meaning.
When you clobber a register, you're telling that this register may be changed unexpectedly, so GCC can no longer count on the values you loaded into the clobbered registers.
Also, it means it's smart enough to know that if you load an expression into a register you should never clobber it unless it's really needed.
For example, if the code refers to that expression later the GCC would optimize it by avoiding another processing of the same expression.

That's all for this post. I hope you liked it.
Regards, Raphael S.Carvalho

Tuesday, January 8, 2013

Hard and Exciting Challenge

Dear readers,

I'm going to face the hardest challenge in my career as a computer scientist. I'll write my own Operating System, and as the time goes on, I'll be posting everything about this wonderful in-home project.
Its purpose is to teach me how the x86 architecture actually works, besides improving my programming and other related skills.
In the past few months, I was learning OS development through several articles provided by actual developers. Indeed, I got a depth knowledge, though there are many things waiting to be learned.
I'm feeling engaged and excited to get started, and I hope you'll be an active reader of my blog from now on.

Q: You may be asking yourself: What am I going to gain by reading your blog massively?
A: I'll put all of my effort into the following task: Write both useful and detailed tutorials/articles as much as possible.

That's all for now. You know, we can't lose our time by talking about non-related subject to each other.

Regards,
Raphael S. Carvalho

[PT-BR] Introduction to Linux Processes.

Uma das fundamentais abstrações em sistemas operacionais Unix-like.

Esse artigo é destinado àqueles que gostariam de obter um simples entendimento do assunto, ou àqueles que desejam aprender por simples curiosidade.
Neste artigo, quando eu falar Linux, eu estarei referindo ao Linux Kernel. Estritamente dizendo, o termo Linux faz referência apenas ao Kernel.

- Definição
------------------------------------------------
O processo é um programa (código objeto armazenado em alguma mídia) no meio da execução.

É interessante mencionar que os processos são mais que códigos de um programa em execução (também conhecido como text section).
Além do código, os processos também possuem vários recursos que possibilita a transparência e eficácia do gerenciamento realizado pelo Kernel.

Alguns deles são:
o Arquivos abertos pelo processo (Open Files).
o Sinais Pendentes (Pending Signals).
o Estado do processo (Processor state).
o Internal Kernel Data.
o Espaço de memória (Com um ou mais mapeamentos).
o Um ou mais threads de execução.
o Data section contendo variáveis globais.

Definitivamente, processos podem ser definidos como o resultado final do código de programa em execução.

- Threads de execução
------------------------------------------------
Threads de execução (frequentemente referenciada como Threads), são objetos de atividade dentro de um processo. Cada thread contém um exclusivo program counter (registrador), stack, e um conjunto de registradores do processador (virtuais).

É muito interessante dizer que o Linux tem uma única implementação de threads. O Linux não diferencia entre processos e threads. Para ele, as threads são apenas um tipo especial de processo.

- Virtualizações
------------------------------------------------
Os sistemas operacionais modernos fornecem duas virtualizações:
o processador virtualizado e a memória virtual.

O processador virtual fornece ao processo a impressão que ele sozinho monopoliza o sistema. Na realidade, ele irá compartilhar o processador físico com um número mais elevado de processos.

Memória virtual segue o mesmo conceito, ela deixa o processo alocar e gerenciar a memória como se o processo fosse o único utilizador da memória do sistema (física).

Interessantemente, vale a pena notar que thread compartilha a abstração de memória virtual, porém cada uma possui seu próprio processador virtualizado.

- Note que...
------------------------------------------------
Um programa não é um processo, um processo é um programa em execução e seus respectivos recursos (tais recursos foram abordados acima).
De fato, dois ou mais processos podem existir executando o mesmo programa. Processos também podem compartilhar recursos entre si (arquivos abertos, espaço de memória, etc...).

- Criação de processos
------------------------------------------------
Um processo inicia sua vida quando, não surpresamente, ele é criado.
O Linux faz isso por meio da fork() system call (chamadas de sistema).

Essa chamada de sistema cria um novo processo duplicando o existente, ou seja, o caller(o chamador).
O processo que chama fork() é o parent(pai), enquanto o novo processo é o child(filho).
Dessa forma, o processo parent retoma a execução e o filho inicia a execução apartir do mesmo lugar (onde o fork retorna).

É muito interessante saber que a chamada de sistema fork() recebe dois valores do kernel, um é destinado ao pai e outro ao recém-nascido filho.

Término de um processo
------------------------------------------------
Um programa é finalizado via chamada de sistema exit().
Além de terminar um processo, ela libera todos os seus recursos.

Um processo pai pode indagar a respeito do status de um processo filho finalizado via chamada de sistema wait(), a qual ativa um processo para esperar pelo término de um processo específico.
Quando tal processo é finalizado, ele é colocado em um estado especial chamado estado zombie. Eles representam processos finalizados, a única forma de removê-los é o processo pai chamar wait().

Note: Outro nome para um processo é 'task'. Internamente, o Linux refere aos processos dessa forma.

Conclusão
------------------------------------------------
Eu tentei abordar de forma simples os conceitos necessários para seguirmos em frente. Abstraindo tais informações você será capaz de acompanhar o próximo artigo. Estaremos utilizando a árvore do Linux e criando módulos para obter um conhecimento profundo desse assunto.

Att,
Raphael S.Carvalho

Pages