Hacked 10 Bits

Saturday, September 14, 2013

Modifying software for fun and music, part 2

Author's Note: Whew! I've been on a long hiatus from posting. This post in particular is a long-overdue follow-up to an experiment where I documented my foray into deciphering and modifying a particular piece of open source software as I went along. Unlike that post, the following is compiled from notes made during the process. Enjoy!

Last time we left off with having added the ability to specify multiple songs from the command line. However, the last song in the list was getting truncated several seconds early. Also, m4a (AAC) files were not playing at all. Finally, after further testing it came to light that mono MP3 files were not playing correctly. Read on for my notes on treating each of these issues, and finally the output of diff on the original code and my changes, if you're so inclined to use them!

Using ld to manually build an executable

Author's note: further work on raop_play has been sporadic due to other happenings, though I have some new results addressing song truncation, playing other file types, and playback of mono files. I will (hopefully) get this posted shortly!

About three days ago, I finally bit the bullet and decided to give Kubuntu a try. While I have been using Debian happily for the past few years, I am discovering an increasing desire for simplicity and to have things work out of the box, rather than spend large amounts of time configuring everything from scratch. One of my side-goals was to try out the Pulse audio system to see if I could successfully integrate my Linux sound with my Airport Express. As it turns out, this does not work as smoothly as I'd anticipated. While getting it up and running was mostly straightforward, there are some playback issues that appear to be unresolved within the community. Perhaps a project for another day, but with a freshly-installed OS I wanted to play some tunes NOW!

So instead I go back to my previous work on raop_play and simply recompile with the new libraries on this installation... but I run into a problem. For some reason, the OpenSSL library wasn't being found by the linker, giving me the following error:

...

gcc -o raop_play raop_play.o raop_client.o rtsp_client.o aexcl_lib.o base64.o aes.o m4a_stream.o audio_stream.o wav_stream.o mp3_stream.o flac_stream.o ogg_stream.o aac_stream.o pls_stream.o pcm_stream.o -lssl -lsamplerate -lid3tag 

raop_client.o: In function `rsa_encrypt':

raop_client.c:(.text+0x131): undefined reference to `RSA_new'

raop_client.c:(.text+0x175): undefined reference to `BN_bin2bn'

raop_client.c:(.text+0x1b3): undefined reference to `BN_bin2bn'

...

... and so on for another dozen lines. Actually, it was an even-worse problem at first, until I discovered that gcc is sensitive to the ordering of files and library references, and fixed the Makefile to move all "-l" options to the end of the command, as shown above.

I racked my brain, wondering if I had an incompatible version of the OpenSSL library, or if the system's library path was broken, but every workaround I tried yielded the same result.

"It's all just bit-flipping and timing"

This was stated to me by a coworker at my first job. It was at a formative moment in my life: getting ready to start college, working at a then-startup doing data entry and computer repair, and learning to program in C++ in my spare time. My coworker said this while I was shoulder-surfing his efforts to program an embedded computer to display text on a small serial-driven LCD screen. He was effectively instructing the computer to set particular bits on the display (bit-flipping) at specific times (timing). It's all just bit-flipping and timing...

Four years later I was taking CSSE 380, Organization of Programming Languages, learning Scheme and writing continuation-passing interpreters with garbage collection and static type checking. Which is to say, bending my mind on a nightly basis. (I recall discussing with my roommates - also in the class - starting a band called Dr. Scheme and writing heavy metal songs that always ended in "cond, lambda, define!" But I digress...)

Looking back on it now, that course wasn't really about learning Scheme, nor was it about coding garbage collection or type checking. It was about arriving at the fundamental realization that under the hood of the language compilers and interpreters we use on a daily basis lives an amazing paradox. Each one is a transformation on some language, turning it into yet another language (e.g., assembler) or into a sequence of operations performed immediately within that very transformation, producing a final result. And what is amazing is that these transformations are often built using the very language they transform! One can craft a Scheme interpreter in Scheme, using Scheme lists to represent Scheme code. Likewise (with a bit of parsing), one can write a C compiler in C. It was recognizing the duality of a program represented as code (text) and textual strings that can be manipulated in a programmatic way that opened my eyes to a part of what really goes on inside a computer.

Suddenly, the magic of a programming language, hand-crafted by wizards in underground dwellings, was transmuted into a very real, if not easy, undertaking that even us wee burgeoning Computer Scientists could approach. All of the abstractions that we took for granted when hitting the Compile button boiled down into the "bit-flipping and timing" of compilation and interpretation. I would call this a transformative moment in my Computer Science education, where my understanding of the discipline underwent a fundamental shift. And over time, I have come to recognize other moments in a CS education that bear this same mark.

A quick word on Bufferbloat

I've recently been working on a conference paper and some coding in support of my dissertation research, hence the lack of updates. In the meantime, I found this very interesting and important to share...

Jim Gettys (one of the primary people behind the X Windows system most users of Unix and Linux rely upon for their graphical environment) has been busy investigating "Bufferbloat." Apparently this issue is cropping up more and more on the Internet, and causing problems for services such as streaming video (think Youtube or Netflix) and telephony (think Skype or Vonage). Here is his very good introduction to the problem; read on for my two-paragraph synopsis and links, followed by a few thoughts of my own.

In short, network routers (and in many cases switches) use buffers to queue packets when more are arriving than can be sent out a particular network interface. The purpose of buffering is to avoid dropping packets when they can't be sent out fast enough. This almost always happens when one interface of a router operates at a much faster rate than another interface. One place this is common at the border where your ISP's fast network connection meets your home's (relatively) slower network connection.

Buffers are generally good then, right? They keep data from being dropped. Well, when used in excess, buffers end up defeating one of the principal mechanisms built into TCP to avoid congestion on networks. TCP actually needs a few packets to be dropped (don't worry, it's good about resending them as soon as it realizes they didn't make it through) in order to determine how much traffic it can safely place on a network. Without this, TCP keeps pushing more and more data, assuming there is room to spare. This can lead to very high latency and jitter, two primary enemies of all streaming media (video and audio, for example).

You can read a great discussion between Gettys, Vint Cerf, Van Jacobson, and Nick Weaver here. Gettys has also created a website and project around understanding and addressing Bufferbloat.

An addendum - personal perspective

To put the importance of latency and jitter in perspective, I once was engineering a network that had two paths out to the Internet, a cellular connection (lower bandwidth, low to moderate latency) and a satellite link (higher bandwidth, very high latency). One person was using the cellular path to download imagery data. The apparent performance was less than stellar, and so I was asked to re-route their traffic over the satellite link. After making this change, the apparent performance became much worse, yet the person did not understand why.

When I asked about the nature of the downloaded data, I was told each transfer was a small amount of imagery, but transfers were made often and needed to be completed quickly. It turned out to be map tiles like those used by Google Earth. While the satellite link offered greater bandwidth (number of bits it could transfer per unit time), it took a long time to get the first bits all the way across the link. For small transfers, this time dominated over the time to move the remainder of the data after the first bits arrived, so the apparent performance was worse than a lower-bandwidth, lower-latency connection.

Most interactive applications such as voice over IP, video teleconferencing, and network gaming are more sensitive to latency and jitter than they are to occasional packet loss. The problem of bloated buffers is not only a result of past trends in device manufacture and configuration but also of changes in how people use the Internet. What was an acceptable, perhaps appropriate solution when the main kinds of transactions were time-insensitive file and webpage transfers, is no longer appropriate in the age of time-sensitive multimedia streaming.

Parts of the solution are out there, and parts of it are yet to be developed. At this stage, raising everyone's awareness (not just device manufacturers and ISP's but also end-users and application developers) is the best action we can take toward understanding and ultimately addressing the problem.

Wednesday, January 18, 2012

Modifying software for fun and music, part 1

Author's Note: This is the first post in an experiment wherein I document my foray into deciphering and modifying a particular piece of open source software as I do it. My interest lies in whether the resulting posts a) are digestible, and b) provide additional insight into the "how" of the process. As such, these will undergo only cursory editing before being posted. Expect typos!

Update 9/14/2013: The second part of this post is finally available here!

A few months ago, I purchased one of these newfangled Internet-enabled televisions so I could stream movies from Netflix without having to plug my laptop into the TV every time. Since I didn't spring for a model with built-in wireless, I subsequently bought a nifty device from some big-name manufacturer, which lets me plug in an Ethernet device and acts as a wireless client on its behalf. This device happens to also let me stream music from said manufacturer's music application to my stereo via an 1/8" audio plug on the device. Pretty nifty stuff.

My main home computer is a desktop running Linux, and I don't want to boot my laptop every time I want to play some music (the whole point of the TV upgrade, right?). So I want an easy way to stream music from Linux to said device. Well, if you're familiar with audio under Linux, there's something like six different subsystems you can run: OSS, ESD, ALSA, Pulse, et cetera. Someone made a nice module for the Pulse audio subsystem that lets these devices act like virtual sound cards, which is great if you're running Pulse. But after an entire afternoon spent breaking and fixing my sound in an effort to shift from ALSA to Pulse, I decided this wasn't the solution for me.

Fortunately, someone else had the same idea and created a utility called raop_play a while back. This is a command line client that takes the IP address of the device we want to stream to and the filename of the audio file (e.g., MP3) to play. After a quick download and compile (okay, a moderately quick compile after installing a few dependencies and subverting build errors), it worked right out of the box. But it lacked a couple of things I wanted:

The command line only takes a single filename, even though there is an interactive mode with support for playing additional files. I'd like to specify an entire album up front.
Although the documentation claimed support for M4A files (which I happen to have a lot of by virtue of using said manufacturer's music store), I only got errors trying to play them. Playback of MP3 files also seems a bit buggy (playback sometimes stops prematurely). I'm thinking of incorporating a different decoding engine.

For today's post, I will focus just on the first item: playing multiple files. Armed with nothing but a compiler and an innate desire to make this software do what I want, this post is my log of trying to get this to work.

Signals and sockets for querying a process

Hello again, dear reader!

Another part of the research I'm doing entails capturing, processing, and storing network packet attributes. This is done in a nifty application that invol... oh, but that is a post on its own! What I'd like to share today is an interesting little way of sharing data between the packet capture process and another running process.

So here's the skinny: my application uses libpcap to do packet capture. Pcap has a couple ways to process the packets it grabs off the wire, both of which are blocking. My code also has to answer queries from a single other process on the same machine. But, even if my while loop (if using pcap_next()) or callback (if using pcap_loop() or pcap_dispatch()) checks somehow for pending queries, the querying process has to wait until the pcap process gets another packet for that check to occur. The question arises: how can this application respond immediately to a query, regardless if packets are currently being captured?

Shared memory and multithreading is an option, as is pushing data to a separate database. But we want simple (my entire application is under 300 lines of code, counting the solution I describe here), and the machines I want to run this code on may not be able to support a database server. Besides, what's the fun in doing this if there isn't an opportunity for a bit of hackery?

It turns out that a combination of sockets and signals does just the trick. We're going to give the pcap process a listening Unix socket and and a function to handle signals, and let the OS do the rest of the work for us.

Before we jump into the code, let's making life simpler and take all this packet capture business out of the picture - that's complicated enough on its own, and may be the subject of another post in the future. Instead, let's say we have a table (2-D array) of students and the classes they must take. Each spot in the table is a struct with the quarter in which they took the class and the grade they received. That way we get a struct for the query (student and class) and another for the response (quarter and grade). And to keep things easy on ourselves, we'll make everything a number except for the grade, which will be a single character ('A', 'B', 'C', and so on).

Let's look at the code that processes a query (all error-checking has been removed for simplicity):

  void handle_query(int sig) {

    char buffer[BUF_SIZE];

    int sd = accept(sock, NULL, NULL);
    int len = recv(sd, buffer, BUF_SIZE, 0);
    struct query *q = (struct query *)buffer;

    struct record *r = &records[q->student][q->class];

    send(sd, (char *)r, sizeof(struct record), 0);
    close(sd);
}

Wow, that was easy! Looks a lot like the standard TCP server from a network programming 101 class, doesn't it? Accept a connection from a listening socket, receive a query, typecast it into a struct, do a lookup, send the result typecast as a byte array, and close the connection. If you haven't seen something similar before, check this out or do a quick Google search for "Linux TCP server in C". I'll provide the definitions of struct query and struct record at the end; for now, just know that sock and records are global variables.

So what's with this funky-looking function declaration? It's a void; that's okay, but what's this int sig that never gets used in the function body? Well, this function isn't actually called by any code in the program per se; it's a signal handler. "A signal what?" you ask...

Sending raw Ethernet frames in 6 easy steps

Part of my work entails building a protocol stack in C that lives alongside TCP/IP and which is constructed entirely in user space, save for a few standard system calls. Hence, I need the ability to craft raw Ethernet frames and send them using only the facilities the operating system provides.

Fortunately, I have two things going for me. First, the implementation is all being done under Linux (in particular, a 2.6.32 kernel, though I believe any 2.6.x kernel will do). Second, a few other authors have gone before me in deciphering the man pages and system calls and have put together some great example codes. Most notably, Andreas Schaufler, whose writeup inspired my implementation as well as the code in this post.

So why write more on the subject? Because it takes time to understand all the moving parts and pieces, and most of us are operating under deadlines. What I hope to contribute is a one-stop, soup-to-nuts explanation with example code to save time for the next person. I'm going to assume you are somewhat comfortable with the C programming language and have seen or done some network programming before. However, my goal is to make this as painless as possible.

Starting point

First, let's try to make things easy for whomever is using this code. If they want to send a raw Ethernet frame, what things can we reasonably ask them to provide?

The destination MAC address would be nice, for sure. If they only know the IP, you can either have them look up the MAC with the arp command or code that lookup into your software. Let's assume you've done this if needed.
Since the frame needs to go out some particular Ethernet interface, we'll assume they know this already and can specify it by its "friendly" name, e.g., "eth0". The source MAC is needed as well, but we can look that up from the interface, assuming they're not trying to fake it. If they are, I will point out where to do this.
An Ethernet protocol number would also be good to know. This is the value that tells the receiving system which protocol is contained inside the frame. IP uses 0x0800, but perhaps you, like me, want to craft your own protocol that isn't processed by the TCP/IP stack.
Finally, the bytes you want to put into the message. This part is entirely up to you. Maybe you want to inject the contents of a captured packet, or some payload of your own creation. All we care about is that the bytes are stored somewhere (we'll say a C byte or "char" array) and you know the number of bytes to be sent. (We'll also assume you're staying within the number of bytes that fit in a frame, a.k.a. the Maximum Transmission Unit or MTU.)

Given these four pieces of information, let's walk through the code to send a frame. It will take six steps, listed below. Depending on your starting point, you may omit some of these steps.

Defining our input and output data
Building a raw "packet" socket
Looking up the interface index and MAC address
Filling in the packet contents
Filling in the link layer socket address structure
Sending the packet

Hacked 10 Bits

Saturday, September 14, 2013

Modifying software for fun and music, part 2

Wednesday, May 16, 2012

Using ld to manually build an executable

Wednesday, March 21, 2012

"It's all just bit-flipping and timing"

Wednesday, February 22, 2012

A quick word on Bufferbloat

Wednesday, January 18, 2012

Modifying software for fun and music, part 1

Wednesday, December 14, 2011

Signals and sockets for querying a process

Sunday, December 4, 2011

Sending raw Ethernet frames in 6 easy steps