Lab 2 - Bits and Bytes
Introduction
In this lab, we’ll focus on ordered collections of bits to help us understand:
- How bits are combined to represent numbers and the kinds of bitwise logic we can apply to these numbers
- How data is stored as a series of bits, grouped together as bytes
Bits and Bitwise Operations
First, work with the people at your table to start getting familiar with the system for submitting the bit-fiddling homework. For this task, please discuss together possible solutions to the isequal
task listed below.
- isequal
- Given
x
andy
, setz
to1
ifx == y
; otherwise setz
to0
without using==
or multi-bit constants.For full credit, use ≤ 5 operations from {
!
,~
,+
,<<
,>>
,&
,^
,|
} and up to 1-bit constants.
Work out some example code with your table, then a few examples. The bit fiddling program uses 32-bit integers by default, but that’s a lot of bits all at once! So, start small. Work through an example or two that uses 4 bits, or 8 bits. For example, what does your program do with:
x = 0b0000
andy = 0b0000
x = 0b1000
andy = 0b1000
x = 0b0001
andy = 0b0001
x = 0b1111
andy = 0b0111
- Try some more. Its best to work out your logic before coding it up and submitting.
Once you’re ready, SSH to the portal nodes, write up a new file isequal.bits
with your code, then submit using cso1bits isequal isequal.bits
. How did you do?
Note: Everyone at the table should submit. You’ll need to do this for homework 1, but you won’t be able to work with as many people for the rest of the assignment.
Bytes and Digital Data
Now we’ll look at larger collections of bits! Whether punch cards, paper tape, magnetic tape, magnetic disk, optical disc, NAND-flash, or technologies yet unknown, large quantities of digital information have always been, and likely will always remain, giant streams of sequential bits. These are stored physically in something, so they have limited flexibility: you can change a bit to 0 or 1, but you can’t remove a bit entirely without moving every other bit over to fill in the gap.
When interacting with such “binary files,” it is typical to use a tool known as a “hex editor.” This shows each byte (set of eight bits) as a two-digit hexadecimal value and allows users to edit that information in place. Because, really, who wants to edit the binary directly?
Getting a hex editor
For ease of viewing images, we’ll use an online hex editor today:
Although it is an online tool, you need to open files from your local drive; your basic process is
- Download a file to edit
- Go to hexed.it
- Use the “Open file” option on the top to load the file
- Edit the file as desired
- Use “Export” to save the file
ASCII
Many files contain textual information, generally encoded using ASCII or a compatible superset of that such as UTF-8, ISO-8859-1, Windows-1252, Mac-Roman, etc. In these, each character is mapped to a single byte between 0 and 127; bytes larger than 127 are often parts of multi-byte character encodings and vary by encoding variant.
Hex | Char |
---|---|
09 | TAB (\t ) |
0A | LF (\n ) |
0D | CR (\r ) |
20 | Space |
21 | ! |
22 | " |
23 | # |
24 | $ |
25 | % |
26 | & |
27 | ' |
28 | ( |
29 | ) |
2A | * |
2B | + |
2C | , |
2D | - |
2E | . |
2F | / |
30 | 0 |
31 | 1 |
32 | 2 |
33 | 3 |
34 | 4 |
35 | 5 |
36 | 6 |
37 | 7 |
38 | 8 |
39 | 9 |
3A | : |
3B | ; |
3C | < |
3D | = |
3E | > |
3F | ? |
Hex | Char |
---|---|
40 | @ |
41 | A |
42 | B |
43 | C |
44 | D |
45 | E |
46 | F |
47 | G |
48 | H |
49 | I |
4A | J |
4B | K |
4C | L |
4D | M |
4E | N |
4F | O |
50 | P |
51 | Q |
52 | R |
53 | S |
54 | T |
55 | U |
56 | V |
57 | W |
58 | X |
59 | Y |
5A | Z |
5B | [ |
5C | \ |
5D | ] |
5E | ^ |
5F | _ |
Hex | Char |
---|---|
60 | \` |
61 | a |
62 | b |
63 | c |
64 | d |
65 | e |
66 | f |
67 | g |
68 | h |
69 | i |
6A | j |
6B | k |
6C | l |
6D | m |
6E | n |
6F | o |
70 | p |
71 | q |
72 | r |
73 | s |
74 | t |
75 | u |
76 | v |
77 | w |
78 | x |
79 | y |
7A | z |
7B | { |
7C | | |
7D | } |
7E | ~ |
Download ritchie.txt and open it in your hex editor. You should see both a hex edit space and a character edit space, along with some numerical interpretations.
To check off this lab, you’ll need to demonstrate to your TA that you can do the following in the hex editing space:
- Put the first line into all upper-case.
- Replace all line breaks with spaces.
Images
There are many image formats, and many of them use esoteric math to encode large regions of color succinctly. However, some are uncompressed and represent colors directly; these generally include a header of some kind, followed by three bytes per pixel: a red byte (0 for no red light, 255 for right red light), a green byte (also 0–255), and a blue byte (ditto).
One such uncompressed format is the BMP file format. The BMP file format has several variants, but the most common has
- Width of image as a little-endian 32-bit number in bytes 18–21
- Height of image as a little-endian 32-bit number in bytes 22–25
- Image data starting at a byte whose index is given in a 4-byte little-endian number in bytes 10–13
Endianness
We’ll discuss endianness more fully later on in the semester, but we’ll need to know the basics to effectively edit our image file.
Let us first consider an example not in binary. Think about two long numbers, such as 867,530 and 123,456. When you read that sentence, you read the 8 of the first number first, then the 6, then the 7 and so on; that is, you started with the high-order digit first and read (left-to-right) down to the low-order digit (0) last. Effectively, that is what’s known as “big-endian”.
On the other hand, if you were to add those two numbers together, you’d start by adding the low-order digits first: 0 + 6, working your way right-to-left as you add the digits and handle the carry digit (as needed). In this case, you would be interacting with the numbers in a “little-endian” way (i.e., starting with the low-order digit.
Now that we have the premise, how does it actually work in binary? Our machines store numbers as an ordered sequence of bytes (each byte is 8 bits long). For little-endian, the first byte written is actually the low-order byte! So if we’re writing down the following number (with it’s 32-bit binary representation):
42 = 0x2a = 00000000 00000000 00000000 00101010
Then that number would be stored in the BMP file little-endian, with the low-order byte first! It would look like:
00101010 00000000 00000000 00000000
, which viewed as hexadecimal would be2a 00 00 00
.
Low-res photo of the Rotunda
Download rotunda.bmp and do the following:
-
Swap the width and height values and save the result as
swap.bmp
. This should look odd and horizontally streaky. -
Steganography is the art of hiding information such that people seeing it don’t know it is even there. One simple form of steganography hides information in the low-order bits of each pixel in an image.
Starting from the original rotunda.bmp, encode “win” in the low-order bit of the first 24 bytes of image data. In particular,
- convert “win” to 3 bytes using ASCII
- Encode those 3 bytes as 24 big-endian bits
- For each of the first 24 image data bytes
- set the low-order bit of the byte to 0 if the corresponding bit of “win” is 0
- set the low-order bit of the byte to 1 if the corresponding bit of “win” is 1
Suppose we want to encode “x” in the low-order bits of data consisting of bytes
01 23 54 76 89 a4 cd 5f
. Since “x” is ASCII \(78_{16}\), or \(01111000_2\), we encode this by setting the lower-order bits as follows:01
set bit to 0 00
(changed bit)23
set bit to 1 23
(unchanged, as it was already 1)54
set bit to 1 55
(changed bit)76
set bit to 1 77
(changed bit)89
set bit to 1 89
(unchanged, as it was already 1)a4
set bit to 0 a4
(unchanged, as it was already 0)cd
set bit to 0 cc
(changed bit)5f
set bit to 0 5e
(changed bit) - Save the resulting image as
hide.bmp
and verify that it does not look odd when viewed as an image
Later in the semester, we’ll walk through getting set up with
git
and hopefully some other methods for copying files to and from the CS department portal nodes. On the command line, you can use programs such asxxd
to view binary files as hexadecimal and edit those files directly. If you want to know more aboutxxd
, SSH into the portal and runman xxd
.To use
xxd
to help in editing a file namedrotunda.bmp
stored on the portal, we would:
- Convert binary to hexadecimal as a new file:
xxd rotunda.bmp > temp-hex-file
- Edit the file
temp-hex-file
in the editor of your choice (vim, nano), just as we’ve done above- Convert the hexadecimal back to binary:
xxd -r temp-hex-file > rotunda-edited.bmp
Check-off
To check-off this lab, show a TA
- your progress on the
isequal
bit-fiddling task by runningcso1bits status
- that you can replace newlines and capitalize words on the hex side of the editor, and describe the pattern you found doing so
- the encoded data in the hex editor for
hide.bmp
For most students, this should happen in lab; if you have completed the lab exercise before lab occurs, you are welcome to do it in their office hours.