Beej Networking Tutorial

09 Jun 2025 -

A tutorial for Network Socket programming using C and Linux system calls.

You can find the link to the tutorial here or the Resources Tab.

What is a socket?

Reminder: Everything in Unix is a file.

In short socket are way for programmes to communicate with one another using standard UNIX file descriptor. Where a a file descriptor is an integer associated with an open file. Here a file can be a Network connection, FIFO, a terminal, a pipe or just about anything.

For networks, when calling the ‘socket()’ system routine you receive a file descriptor called a socket descriptor. We can communicate through the socket descriptor by using the specialized send() and recv() system calls.

Types of socket

This tutorial focuses on the Internet sockets.

NOTE: There are other types of sockets available and more then the two types of internet sockets that will be discussed. Raw sockets are not covered in this tutorial but are a powerful tool that should be explored later.

The two sockets are “Stream Sockets” (SOCK_STTREAM) and “Datagram Socket” (SOCK_DGRAM).

Stream Sockets are reliable two-way communication streams. They preserve the order of of items at the other end and will be error-free. Essentially, like TCP (3-way handshake to set up a connection and acknowledge each packet). ~ For more information on TCP see RFC 793. These type of sockets are used when we require the order of packets to be maintained i.e. SSH applications (you wouldn’t want your keystrokes to arrive in a different order).

Datagrams are called connectionless and uses UDP (see RFC 768). Even so, it can still use the ‘connect()’ system call (or see the man page for more information). This means data may or may not arrive and could potentially arrive out of order.
Although, if packet arrives the data within will be error-free. These type of sockets are used when a few dropped packets won’t hurt i.e. tftp (trivial file transfer protocol).
TFTP transfers binary application from one host to another so how does UPD work if packets may not arrive. UDP can be used along with “ACKs” (acknowledgements) to indicate a packets has arrived. If after a set interval of time the sender does not receive an acknowledgement then assume it was not delivered and re-transmit the packets.

Unreliable protocols are generally used for speed and speed alone i.e. fire away and forget about the consequence·

Network Theory

Ethernet

UDP

TFTFP

Data

Data encapsulation is the process of bundling data ontop of other data. In the OSI Network model when a ‘packet’ is born application data is wrapped with a header (say the TFTP protocol or HTTP) and this packet is passed down to the lower layer. The whole thing will then get encapsulated (including the previous header) by the next protocol (say, UDP) and so on until you reach the final hardware protocol layer (i.e. ethernet or wifi). Note in the OSI model only the Data-Link layer adds a footer (often called a Trailer).

When a packet is received by another host, the opposite process happens. The headers are striped one layer at a time from the physical layer until you finally get the data. The process is called de-encapsulation.

The ISO/OSI model is used as a reference model to describe a system of network functionality. The 7 layer osi model is:

Layer 7: Application
Layer 6: Presentation
Layer 5: Session
Layer 4: Transport
Layer 3: Network
Layer 2: Data-Link
Layer 1: Physical

The physcial layer is the hardware (signals, ethernet, electricity) and the application layer is where the user interacts with the network.

A network model that is more consistent with what is actually used and consistent with unix is the following. This model looks a lot like the original 4-layer TCP/IP model.

Layer 4: Application Layer
Layer 3: Transport Layer
Layer 2: Internet Layer / Network Layer
Layer 1: Network Access Layer

These layers correspond to the encapsulation of data

When making and sending a packet in Linux all use the send()’ or type ‘man send’ in your terminal. All we have to do is encapsulate the packet in the method of our choosing and ‘sendto()’ it out. The kernel is respoinsible for building the Transport and the Internet Layer etc.

Ip Addresses, IPv4 and IPv6

IPv4 is four bytes and is written in “dots and number” form like 192.0.2.111. It is still the most common IP version. IPv4 is going to run out of addresses even though there are Billions of IP addresses in the 32-bit IPv4 address, 2^32 (~4 billion addresses). This is partly due the fact there were only a few computers at the time, so a billion seemed impossibly large to reach. Organisations allocated millions of IP addresses for their own use.

Several stopgap measures have been implemented otherwise we would have run out a long time ago (like the ‘evil’ NAT].

IPv6 was proposed as way to overcome the lack of addresses available and allows every device to have its own unique IP address. IPv6 is a 128-bit addressm which is a substantial amount of more addresses, 2^128 (340 trillion trillion trillion numbers). IPv6 uses hexadecimal with each two-byte chunck separate by a colon.
Example- 2001:0db8:c9d2:aee5:73e3:934a:a5ae:9551

Remember: that a IPv6 address is 16bytes, a hexadeciaml is 4 bits so 2 hexadecmials are 8-bits (a byte). So each chunk separated by a colon is a two-bytes).

IPv6 addresses can be compressed using the ‘::’ symbol, which means fill in the gap with lots of zeros i.e.

2001:db8:ab00:: == 2001:0db8:ab00:0000:0000:0000:0000:0000
2001:db8:c9d2:12::51 == 2001:0db8:c9d2:0012:0000:0000:0000:0051
::1 == 2001:db8:c9d2:12::51

::1 is the loopback address (“the machine the program is running on now”) and is equivalent to the IPv4 loopback address 127.0.0.1. Ipv6 has a IPv4 compatability mode where we can represent a IPv4 address 192.0.2.33 as an Ipv6 address by using the following notation “::ffff:192.0.2.33”.

A final thing to note is that there are addresses that are reservered for both ipv4 and ipv6 (trillions and trillions of addresses) but frankly there is still enough addresses that we should not run out.

Subnets

Is a way to separate the network portion and host portion of an IP address. Is often represented as in slash notation i.e. /24, /32.

In the past, we used “Class” networks i.e. class A, Class B and C
Class A- 1 byte (8 bit) Network and 3 byte (24 bit host giving 16 million or so hosts).
…
Class C- 3 byte network and 1 byte host (256 host minus reserved addresses).

The Netmask describes the network portion of the IP address. When netmask and IP addess are bitwise-AND you will get the network number out of it. Netmasks look like the following 255.255.255.0.

NOTE: Netmask can be arbitrary, a series of 1’s followed by a bunch of 0’s. Example 255.255.255.252 (30 network bits and 2 host bits). This notation is not compact, so we used slash after the IP address and put the number of network bits in decimal.

Port Numbers

The Port number is another address (alongside IP addresses) that the Transport layer protocols TCP & UDP use. It is 16-bits long and acts like the local address for a connection. Different services run on different ports. This way a computer can handle incoming mail AND web services between two computers with a single IP address.

There are many well-known port numbers. These can be seen in the ‘/etc/services’ file or in the BIG IANA Port List.

HTTP	port 80
telnet	port 23
SMTP	port 25

Port numbers under 1024 require OS privileges to use as they are considered special.

Byte Order

Big Endian is where you store the most significant bits first (“the big end first”) i.e. for a hexadecimal b34f you will store it in two sequential bytes b3 followed by 4f.

Little Endian is where you store the least significant bit first / “the small end first” i.e. the hexadecimal b34f will be stored sequentially as 4f followed by b3. The reverse order.

Remember: two hexadecimal numbers are a byte, so 4 hexadecimal numbers like b34f is 2 bytes.

The Big-Endian order is also called the Network Byte Order as its the order network types like.

Computers store numbers in Host Byte Order, which may be Big or Little Endian. For a Intel 80x86, it is Little-Endian. For Motorola 68k, host bytes are Big Endian. For powerPC, … and so on.

When building packets / filling out data structures you need to make sure the two-and four-byte numbers are in the Network Byte Order. There are functions that will convert the host byte order to the Network Byte Order if needed to be. Don’t have to worry about if native host byte order is incorrect. This way our code is portable to machines of different endian’s.

When converting two bytes use ‘short’ and 4 bytes use ‘long’. This also works for unsigned variation as well. To convert between Host Byte Order and Network Byte order we use:

key for table: h = host , n = network, s = short , l = long

function	- description
htons()	- ‘h‘ost ‘to’ ‘n‘etwork ‘s‘hort
htonl()	- ‘h‘ost ‘to’ ‘n‘etwork ‘l‘ong
ntohs()	- ‘n‘etwork ‘to’ ‘h‘ost ‘s‘hort
ntohl()	- ‘n‘etwork ‘to’ ‘h‘ost ‘l‘ong

Using these functions allow us to convert numbers to Network Byte Order before they go out on the wire and convert back to Host Byte order as they come in off the wire. If you want to do floating point, you have to do something called serialisation (discussed later in tutorial).

Stucts

[to be continued]