Lab 2 - Bits and Bytes

Introduction

In this lab, we’ll focus on ordered collections of bits to help us understand:

  • How bits are combined to represent numbers and the kinds of bitwise logic we can apply to these numbers
  • How data is stored as a series of bits, grouped together as bytes

Bits and Bitwise Operations

First, work with the people at your table to start getting familiar with the system for submitting the bit-fiddling homework. For this task, please discuss together possible solutions to the isequal task listed below.

isequal
Given x and y, set z to 1 if x == y; otherwise set z to 0 without using == or multi-bit constants.

For full credit, use ≤ 5 operations from {!, ~, +, <<, >>, &, ^, |} and up to 1-bit constants.

Work out some example code with your table, then a few examples. The bit fiddling program uses 32-bit integers by default, but that’s a lot of bits all at once! So, start small. Work through an example or two that uses 4 bits, or 8 bits. For example, what does your program do with:

  1. x = 0b0000 and y = 0b0000
  2. x = 0b1000 and y = 0b1000
  3. x = 0b0001 and y = 0b0001
  4. x = 0b1111 and y = 0b0111
  5. Try some more. Its best to work out your logic before coding it up and submitting.

Once you’re ready, SSH to the portal nodes, write up a new file isequal.bits with your code, then submit using cso1bits isequal isequal.bits. How did you do?

Note: Everyone at the table should submit. You’ll need to do this for homework 1, but you won’t be able to work with as many people for the rest of the assignment.

Bytes and Digital Data

Now we’ll look at larger collections of bits! Whether punch cards, paper tape, magnetic tape, magnetic disk, optical disc, NAND-flash, or technologies yet unknown, large quantities of digital information have always been, and likely will always remain, giant streams of sequential bits. These are stored physically in something, so they have limited flexibility: you can change a bit to 0 or 1, but you can’t remove a bit entirely without moving every other bit over to fill in the gap.

When interacting with such “binary files,” it is typical to use a tool known as a “hex editor.” This shows each byte (set of eight bits) as a two-digit hexadecimal value and allows users to edit that information in place. Because, really, who wants to edit the binary directly?

Getting a hex editor

For ease of viewing images, we’ll use an online hex editor today:

https://hexed.it/

Although it is an online tool, you need to open files from your local drive; your basic process is

  1. Download a file to edit
  2. Go to hexed.it
  3. Use the “Open file” option on the top to load the file
  4. Edit the file as desired
  5. Use “Export” to save the file

ASCII

Many files contain textual information, generally encoded using ASCII or a compatible superset of that such as UTF-8, ISO-8859-1, Windows-1252, Mac-Roman, etc. In these, each character is mapped to a single byte between 0 and 127; bytes larger than 127 are often parts of multi-byte character encodings and vary by encoding variant.

Hex Char
09 TAB (\t)
0A LF (\n)
0D CR (\r)
20 Space
21 !
22 "
23 #
24 $
25 %
26 &
27 '
28 (
29 )
2A *
2B +
2C ,
2D -
2E .
2F /
30 0
31 1
32 2
33 3
34 4
35 5
36 6
37 7
38 8
39 9
3A :
3B ;
3C <
3D =
3E >
3F ?
Hex Char
40 @
41 A
42 B
43 C
44 D
45 E
46 F
47 G
48 H
49 I
4A J
4B K
4C L
4D M
4E N
4F O
50 P
51 Q
52 R
53 S
54 T
55 U
56 V
57 W
58 X
59 Y
5A Z
5B [
5C \
5D ]
5E ^
5F _
Hex Char
60 \`
61 a
62 b
63 c
64 d
65 e
66 f
67 g
68 h
69 i
6A j
6B k
6C l
6D m
6E n
6F o
70 p
71 q
72 r
73 s
74 t
75 u
76 v
77 w
78 x
79 y
7A z
7B {
7C |
7D }
7E ~

Download ritchie.txt and open it in your hex editor. You should see both a hex edit space and a character edit space, along with some numerical interpretations.

To check off this lab, you’ll need to demonstrate to your TA that you can do the following in the hex editing space:

  1. Put the first line into all upper-case.
  2. Replace all line breaks with spaces.

Images

There are many image formats, and many of them use esoteric math to encode large regions of color succinctly. However, some are uncompressed and represent colors directly; these generally include a header of some kind, followed by three bytes per pixel: a red byte (0 for no red light, 255 for right red light), a green byte (also 0–255), and a blue byte (ditto).

One such uncompressed format is the BMP file format. The BMP file format has several variants, but the most common has

  • Width of image as a little-endian 32-bit number in bytes 18–21
  • Height of image as a little-endian 32-bit number in bytes 22–25
  • Image data starting at a byte whose index is given in a 4-byte little-endian number in bytes 10–13

Endianness

We’ll discuss endianness more fully later on in the semester, but we’ll need to know the basics to effectively edit our image file.

Let us first consider an example not in binary. Think about two long numbers, such as 867,530 and 123,456. When you read that sentence, you read the 8 of the first number first, then the 6, then the 7 and so on; that is, you started with the high-order digit first and read (left-to-right) down to the low-order digit (0) last. Effectively, that is what’s known as “big-endian”.

On the other hand, if you were to add those two numbers together, you’d start by adding the low-order digits first: 0 + 6, working your way right-to-left as you add the digits and handle the carry digit (as needed). In this case, you would be interacting with the numbers in a “little-endian” way (i.e., starting with the low-order digit.

Now that we have the premise, how does it actually work in binary? Our machines store numbers as an ordered sequence of bytes (each byte is 8 bits long). For little-endian, the first byte written is actually the low-order byte! So if we’re writing down the following number (with it’s 32-bit binary representation):

42 = 0x2a = 00000000 00000000 00000000 00101010

Then that number would be stored in the BMP file little-endian, with the low-order byte first! It would look like:

00101010 00000000 00000000 00000000, which viewed as hexadecimal would be 2a 00 00 00.

low-res Rotunda image
Low-res photo of the Rotunda

Download rotunda.bmp and do the following:

  1. Swap the width and height values and save the result as swap.bmp. This should look odd and horizontally streaky.

  2. Steganography is the art of hiding information such that people seeing it don’t know it is even there. One simple form of steganography hides information in the low-order bits of each pixel in an image.

    Starting from the original rotunda.bmp, encode “win” in the low-order bit of the first 24 bytes of image data. In particular,

    1. convert “win” to 3 bytes using ASCII
    2. Encode those 3 bytes as 24 big-endian bits
    3. For each of the first 24 image data bytes
      • set the low-order bit of the byte to 0 if the corresponding bit of “win” is 0
      • set the low-order bit of the byte to 1 if the corresponding bit of “win” is 1

      Suppose we want to encode “x” in the low-order bits of data consisting of bytes 01 23 54 76 89 a4 cd 5f. Since “x” is ASCII \(78_{16}\), or \(01111000_2\), we encode this by setting the lower-order bits as follows:

      01 set bit to 0 00 (changed bit)
      23 set bit to 1 23 (unchanged, as it was already 1)
      54 set bit to 1 55 (changed bit)
      76 set bit to 1 77 (changed bit)
      89 set bit to 1 89 (unchanged, as it was already 1)
      a4 set bit to 0 a4 (unchanged, as it was already 0)
      cd set bit to 0 cc (changed bit)
      5f set bit to 0 5e (changed bit)
    4. Save the resulting image as hide.bmp and verify that it does not look odd when viewed as an image

Later in the semester, we’ll walk through getting set up with git and hopefully some other methods for copying files to and from the CS department portal nodes. On the command line, you can use programs such as xxd to view binary files as hexadecimal and edit those files directly. If you want to know more about xxd, SSH into the portal and run man xxd.

To use xxd to help in editing a file named rotunda.bmp stored on the portal, we would:

  1. Convert binary to hexadecimal as a new file: xxd rotunda.bmp > temp-hex-file
  2. Edit the file temp-hex-file in the editor of your choice (vim, nano), just as we’ve done above
  3. Convert the hexadecimal back to binary: xxd -r temp-hex-file > rotunda-edited.bmp

Check-off

To check-off this lab, show a TA

  • your progress on the isequal bit-fiddling task by running cso1bits status
  • that you can replace newlines and capitalize words on the hex side of the editor, and describe the pattern you found doing so
  • the encoded data in the hex editor for hide.bmp

For most students, this should happen in lab; if you have completed the lab exercise before lab occurs, you are welcome to do it in their office hours.


Copyright © 2025 John Hott, portions Luther Tychonievich.
Released under the CC-BY-NC-SA 4.0 license.
Creative Commons License