Practical Sockets in Python

Presenter Notes

Contents

  • Basic Sockets
    • UDP
    • TCP
  • Name Resolution
  • Non-Blocking Sockets
  • Handling Stream Data
  • Libraries and tools

Presenter Notes

What I'll be talking about

What do I know about this?

  • Not much
  • Huge and complicated topic
  • Apologies for any mistakes!
  • Been writing networking code for ~5 years

Presenter Notes

Prequel

  • Python 3.2 and 2.7
    • Sockets work with bytes
    • Assume from socket import * for examples
  • IP version agnostic (IPv4 & IPv6)
  • No protocol design
net_prog_covers.png

Presenter Notes

Basic Sockets

Presenter Notes

Internet Protocol

  • Packet switched network
    • Packets travel between different endpoints
  • Not reliable
    • Fragmentation
    • Maximum Transmission Unit (MTU)
  • IPv4 is running out of addresses
    • IPv6 is different network, new addresses
  • Carrier for other Protocols: TCP & UDP

Presenter Notes

UDP and TCP

  • Layered on top of IP
  • UDP: packets
    • datagrams, SOCK_DGRAM
    • unordered
    • unreliable
    • unconnected
  • TCP: stream
    • stream of bytes, SOCK_STREAM
    • ordered
    • reliable
    • connected endpoints
  • .recvfrom(), .recv(), .sendto(), .send() & Co behave differently depending on socket type!

Presenter Notes

  • SOCK_DGRAM methods return whole packets
  • SOCK_STREAM methods return any number of bytes from the stream

UDP Basics

Server:

>>> sock = socket(AF_INET6, SOCK_DGRAM)
>>> sock.bind(('', 1055))
>>> data, clientaddr = sock.recvfrom(4096)
>>> data
'request'
>>> clientaddr
('::ffff:127.0.0.1', 39241, 0, 0)
>>> sock.sendto(b'response', clientaddr)
7
>>> sock.close()
>>> del sock

Client:

>>> sock = socket(AF_INET, SOCK_DGRAM)
>>> sock.sendto(b'request', ('127.0.0.1', 1055))
7
>>> sock.recvfrom(4096)
('response', ('127.0.0.1', 1055))
>>> sock.close()
>>> del sock

Presenter Notes

UDP Basics

  • Binding to '' is the wildcard.

  • Use .sendto(), .recvfrom()

    • 4096 is the "pagesize"
    • entire packet is returned
    • tail of packet is lost if buffersize too small
  • udp6 can receive IPv4 connections: ::ffff:n.n.n.n v4 mapped address.

  • Netstat shows the listening socket after binding:

    $ netstat -l -n -A inet,inet6
    Active Internet connections (only servers)
    Proto  Recv-Q  Send-Q  Local Address  Foreign Address  State
    udp6        0       0  :::1055        :::*
    
  • Sockets not cleaned up when closed

Presenter Notes

  • A server socket must bind before it can accept connections.
  • Python translates '' to the appropriate wildcard notation for your OS.
  • Notice connectionless, packet could be received from anywhere.
  • Netstat command is GNU/Linux, options vary
  • Remember: whole packets
  • 4096 is the page size
  • Binding to a specific address restricts the connections, no more V4MAPPED addresses.

UDP Connected Sockets

  • No protocol-level state
  • Essentially a kernel source/destination address filter

Client:

>>> sock = socket(AF_INET, SOCK_DGRAM)
>>> sock.connect(('127.0.0.1', 1055))
>>> sock.send(b'message')
7

Presenter Notes

TCP Basics

Server:

>>> sock = socket(AF_INET6, SOCK_STREAM)
>>> sock.bind(('', 1055))
>>> sock.listen(SOMAXCONN)
>>> client, addr = sock.accept()
>>> addr
('::ffff:127.0.0.1', 54584, 0, 0)
>>> client.recv(4096)
'message'
>>> client.close()
>>> sock.close()

Client:

>>> sock = socket(AF_INET, SOCK_STREAM)
>>> sock.connect(('127.0.0.1', 1055))
>>> sock.sendall(b'message')
7
>>> sock.close()
$ netstat netstat -n -A inet,inet6
Active Internet connections (w/o servers)
Proto  Recv-Q  Send-Q  Local Address    Foreign Address  State
tcp         0       0  127.0.0.1:54584  127.0.0.1:1055   ESTABLISHED
tcp6        0       0  127.0.0.1:1055   127.0.0.1:54584  ESTABLISHED

Presenter Notes

  • SOCK_STREAM is the default
  • .listen() is when it shows up in netstat -l
  • Server must be listening before .connect()
  • Blocking calls: .accept(), .block() and .sendall()

TCP Basics

  • .listen(n): number of waiting clients
  • Server socket does not transfer data
  • .accept() returns client socket

Presenter Notes

TCP State Diagram

tcp_state.gif

Presenter Notes

Socket Options

  • sock.setsockopt(level, optname, value)
    • level: SOL_SOCKET, SOL_IP, SOL_UDP, SOL_TCP
    • optname: SO_*, IP_*, IPV6_*, PACKET_*, TCP_*, ...
    • value: number or struct
  • Most used: SO_REUSEADDR (TCP server)
sock = socket()
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(...)
...

Presenter Notes

  • UDP broadcast needs a socket option
  • Read the manpage, you'll know if you need them

Name Resolution

Presenter Notes

Hostname Resolution

  • Query the DNS system
    • Pretty large library behind you back
    • Usually uses the network
  • Don't use these:
    • socket.getfqdn()
    • socket.gethostbyname() and socket.gethostbynae_ex()
    • socket.gethostname()
    • socket.gethostbyaddr()
  • Instead use IPv6 compatible functions:
    • socket.getaddrinfo()
    • socket.getnameinfo()

Presenter Notes

Getaddrinfo

getaddrinfo(host, port, family=AF_UNSPEC, socktype=0, proto=0, flags=0)

--> [(family, socktype, proto, canonname, sockaddr), ...]
host
Hostname, IP address or None
port
Service name, port number or None
family
Address family: AF_INET, AF_INET6, AF_UNSPEC, ...
socktype
Socket type: SOCK_STREAM, SOCK_DGRAM, ...
proto
Protocol: IPPROTO_TCP, IPPROTO_UDP, ...
flags

AI_CANONNAME: return canonical name

AI_NUMERICHOST, AI_NUMERICSERV: no lookups

AI_PASSIVE: suitable for listening

AI_ADDRCONFIG: AF has configured interface

AI_V4MAPPED, AI_ALL: return v4 addrs when asking AF_INET6

Presenter Notes

  • Provide as many arguments as possible
    • Port may change depending on protocol
    • Usually leave out proto
  • Implementations are buggy, very hard to write
  • Leave port to None if you don't care, otherwise funny effects
  • Don't trust results too much, just try the socket it gives you
  • Do not rely too much on service names, leave it to the user to decide to use a name rather then port number.
  • hostname=None & AI_PASSIVE: wildcard; hostname=None: localhost

Getaddrinfo • example

Be as explicit as possible:

getaddrinfo('example.com', 80, AF_UNSPEC,
            SOCK_STREAM, IPPROTO_TCP, AI_ADDRCONFIG)

In practice just see what works (client):

ai_list = socket.getaddrinfo('example.com', 80, 0, SOCK_STREAM)
err = None
for af, socktype, proto, cn, sockaddr in ai_list:
    sock = None
    try:
        sock = socket(af, socktype, proto)
        sock.connect(sockaddr)
    except Exception, e:
        err = e
        if sock:
            sock.close()
    else:
        break
else:
    raise err
if not ai_list:
    raise error('getaddrinfo returns empty list')

Presenter Notes

  • For server try and avoid using getaddrinfo
    • Usually hard to configure for admins
    • If you do, use AI_PASSIVE|AI_ADDRCONFIG

Getaddrinfo • The Tools

  • create_connection((host, port), [timeout[, source_address]])
  • IPv4 and IPv6
  • SOCK_STREAM only
try:
    sock = create_connection(('example.com', 80))
except socket.error as e:
    logging.warn('Failed to create connection: %s', e.strerror)

strerror:

Connection refused
Name or service not known
...

Presenter Notes

Non-Blocking Sockets

Presenter Notes

Non-Blocking Socket

  • Call .setblocking(Flase)

  • Operations might raise EWOULDBLOCK:

    sock = socket()
    sock.setblocking(False)
    sock.connect(...)
    try:
        sock.recv(4096)
    except IOError as e:
        if e.errno == errno.EWOULDBLOCK:
           pass
    
  • Managing this by had is messy

Presenter Notes

  • .send() would have worked due to buffers

Introducing Select

  • Basic I/O multiplexing API
    • Not very fast
    • Portable
select(rlist, wlist, xlist[, timeout])

--> (ready_rlist, ready_wlist, ready_xlist)
rlist
Sockets you might want to read from
wlist
Sockets you might want to write to
xlist
Sockets you might want to know about "exceptional conditions" (hint: you don't care about these)
timeout
How long to wait

Presenter Notes

  • If select is not good enough you probably shouldn't be doing this by hand.

Introducing Select

Graceful interrupting I/O operations:

while keep_running:
    rpipe, wpipe = os.pipe()
    rlist = get_rlist()
    rlist.append(rpipe)
    wlist = get_wlist()
    readable, writable, _ = select.select(rlist, wlist, [], 60)
    if rpipe in readable:
        try:
            os.read(rpipe, 4096)
        except Exception:
            pass
        break
    if readable:
        # handle reads
    if writable:
        # handle writes

Presenter Notes

Handling Signals

  • System calls raise EINTR
    • Not a bad thing, hands control back
  • Must handle these
def recv(n):
    while True:
        try:
            sock.recv(n)
        except socket.error as e:
            if e.errno == errno.EINTR:
                continue
            else:
                raise
  • This loop is too tight for blocking code
  • As library make sure this is possible:
    • Provide a way to request restart
    • Ensure your API is restartable
  • signal.siginterrupt(signum, flag)

Presenter Notes

  • Deciding when and how to restart your read loop is the tricky part.
    • The signal handler must have a chance to run

Handling (Stream) Data

Presenter Notes

Handling Stream Data

  • No package boundaries in streams

  • Example of protocol header (AgentX, RFC 2741):

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   h.version   |    h.type     |    h.flags    |  <reserved>   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          h.sessionID                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        h.transactionID                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          h.packetID                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        h.payload_length                       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    
  • "Protocol" layer which wants to send and receive "frames"

Presenter Notes

Buffering of Stream Data

  • Minimal copying of data

  • Several approaches:

    • io.BytesIO

      C implementation, hard to use thread safe

    • list of strings

      More fragmented memory

    • bytearray (for fixed size data only)

    • ...

Presenter Notes

Handling Stream Data

Receiving data, using BytesIO:

class Protocol(object):

    def __init__(self):
        self._rbuf = io.BytesIO()

    def data_received(self, data):
        self._rbuf.seek(0, io.SEEK_END)
        self._rbuf.write(data)
        if self._rbuf.tell() >= 20:
            self._rbuf.seek(0, io.SEEK_SET)
            hdr = self._rbuf.read(20)
            payload_length, = struct.unpack('!L', hdr[16:20])
            payload = self._rbuf.read(payload_length)
            if len(payload) == payload_length:
                tail = self._rbuf.read()
                self._rbuf.seek(0, io.SEEK_SET)
                self._rbuf.truncate()
                self._rbuf.write(tail)
                self.process_frame(hdr + payload)
  • .data_received() based on PEP 3153 (async-pep, draft)
  • Struct module great for parsing binary data
    • ! means network byte order
    • Avoid socket.ntoh[sl], socket.hton[sl]

Presenter Notes

Handling Stream Data

Sending data:

class Transport(object):
    def __init__(self):
        self._wbuf = []
        self._rpipe, self._wpipe = os.pipe()

    def write(self, data):
        self._wbuf.append(data)
        try:
            os.write(self._wpipe, 'a')
        except Exception:
            pass

    def _send(self, sock):
        buf = self._wbuf
        self._wbuf = []
        data = ''.join(buf)
        try:
            n = sock.send(data)
        except socket.error, e:
            if e.errno not in [errno.EWOULDBLOCK, errno.EINTR]:
                raise
            self._wdata.insert(0, data)
            del data
        else:
            if len(data) > n:
                self._wdata.insert(0, data[n:])
            del data

Presenter Notes

  • You usually have whole frames to send, memory fragmentation less of an issue.
  • ._send() needs to be called from a mainloop somehow.
  • Normal socket error need to get caught higher up the stack.

Using sock.makefile()

  • Useful for request-response protocols
    • Client sends request
    • Client shuts down it write end
    • Client waits for response, until EOF
    • Server reads request until EOF
    • Server writes response
    • Server closes the socket
  • Does all the buffering for you
    • Probably more efficient and bug free then writing this yourself.
  • sock.shutdown(flag)
    • SHUT_RD: no longer read
    • SHUT_WR: no longer write
    • SHUT_RDWR: might as well .close()

Presenter Notes

Using sock.makefile()

Client:

>>> sock = socket()
>>> sock.connect(('127.0.0.1', 1055))
>>> sock.send(b'request')
7
>>> sock.shutdown(SHUT_WR)
>>> sock.recv(4096)  # blocks
'response'

Server:

>>> sock = socket(AF_INET6)
>>> sock.bind(('', 1055))
>>> sock.listen(SOMAXCONN)
>>> client, addr = sock.accept()  # blocks
>>> fp = client.makefile()
>>> fp.read(4096)                 # blocks until EOF
'request'
>>> fp.write(b'response')
8
>>> client.close()
>>> del client

Presenter Notes

Advanced sock.makefile()

Can be used without shutting down the socket ends:

  • .read(n) reads n bytes or until EOF.
    • Fixed-length headers
    • Header specified payload length
  • Little chance of recovering
    • Best close entire socket

Can be used in non-blocking mode:

  • Call sock.setblocking(False) as normal
  • Calls to .read() and .flush() can raise EWOULDBLOCK
    • Can't know how much data in the internal buffer

Presenter Notes

Libraries

Presenter Notes

DRY

  • Multiplexing I/O is non-trivial
  • Someone already wrote an I/O loop
  • They come under many names:
    • Asynchronous I/O
    • Event driven I/O
    • Non-blocking I/O

(Yes, these all have a particular meaning)

Presenter Notes

Greenlet-based

  • Greenlet:
    • Lightweight stack switching
    • User-mode threads
    • Explicit switches
    • 1 kernel/OS thread
  • Run the I/O loop in the hub
    • This is hidden from you
    • Smart select loop with timeouts (or poll, epoll, kqueue, ...)

Presenter Notes

Greenlet-based

  • Eventlet:
    • Pure-python
    • Multiple I/O loops
    • Monkeypatching or selective imports
  • Gevent:
    • C extension
    • libev (libevent for 1.x)
    • Monkeypatches the world
  • Concurrence
    • Explicit dispatch loop
    • Convenient stream wrappers

Presenter Notes

Eventlet Example

  • Network methods yield control to the hub:

    from eventlet.green.socket import *
    sock = socket()
    sock.connect(('127.0.0.1', 1055))  # switch to hub
    sock.send(b'request')              # switch to hub
    
  • Spawning tasks:

    def server():
        sock = socket(AF_INET6)
        sock.bind(('', 1055))
        sock.listen(SOMAXCONN)
        while True:
            client, addr = sock.accept()                # switch to hub
            eventlet.spawn(handle_client, client, addr)
    
    def handle_client(sock, addr):
        print('connection from {}'.format(addr))
        req = sock.recv(4096)                           # switch to hub
        sock.send(b'response')                          # switch to hub
    
  • No need to do complicated write buffer

Presenter Notes

Twisted

  • Framework, not library
    • (no one likes frameworks)
  • Callback based
    • (no one likes callbacks either)
  • Mainloop is called "Reactor"
    • Many exist which work with GTK+, Qt, ...
  • Very complete
    • Many protocol ready implemented

Presenter Notes

Summary

  • Don't be afraid to use sockets
  • But catch exceptions
  • Use a library for I/O loop where applicable

Presenter Notes

Questions?

Presenter Notes