June 18, 2026 · Yusuf Abdul-Mateen · ~10 min read
At some point, the OS-level APIs for reading disks start to feel like a black box. You call open(), you call read(), you get bytes. But what's actually happening? I wanted to peel back every layer — to talk directly to a USB drive, send it SCSI commands by hand, and parse the raw response bytes without the OS filesystem layer helping me at all.
This is the story of building PartitionSchemeDetector: a C++ program that uses libusb to speak the USB Bulk-Only Transport protocol, send raw SCSI commands to a mass storage device, and figure out whether it uses MBR or GPT partitioning — all without touching /dev/sda.
Full source on GitHub.
A USB drive is, at the hardware level, a SCSI device wrapped in USB. The flash memory controller speaks SCSI command sets internally. USB mass storage is just a transport layer that carries those SCSI commands over USB bulk endpoints. Three onion layers:
Most people never go deeper than "plug it in and mount it." I wanted to operate at the BOT layer — to pack a SCSI command into a CBW structure, send it over a USB bulk endpoint, and unpack the CSW response myself.
This was a research-heavy project. I spent more time reading specs and tutorials than writing code. Here's the trail:
The USB in a Nutshell guide explained USB descriptors — how a device tells the host what it is. Every USB device has a device descriptor, configuration descriptors, interface descriptors, and endpoint descriptors. I needed to walk this tree to find a mass storage interface with bulk endpoints.
The USB Mass Storage Bulk-Only spec defines the wire format. Every command starts with a 31-byte Command Block Wrapper (CBW) and the device replies with a 13-byte Command Status Wrapper (CSW). The CBW contains a signature ("USBC"), a tag, data direction, the command length, and the SCSI command bytes. If the CBW/CSW signatures don't match, the device rejects the command.
The libusb tutorial showed how to enumerate devices, open handles, and claim interfaces. libusb abstracts the kernel's USB subsystem — you don't need to write kernel modules, just link a library.
The MBR explanation covered the boot sector layout: byte 0x1FE–0x1FF is the boot signature (must be 0x55 0xAA), and byte 0x1C2 in the first partition entry tells you the partition type — 0xEE means GPT.
And then there's the T10 SCSI operation codes reference. It's a terse table of hex codes and command names. I used it to find the right opcodes: 0x00 for TEST UNIT READY, 0x25 for READ CAPACITY, 0x28 for READ 10. It's cryptic, but once you know what to look for, it's the definitive source.
The first thing the program does is enumerate all USB devices on the bus, print them with bus/port numbers and vendor/product IDs, and ask which one to use:
Which of these device do you want to use? [1 - 7]
1) BUS: 001 PORT: 001 ID 1d6b:0002
2) BUS: 001 PORT: 002 ID 0424:2512
3) BUS: 001 PORT: 003 ID 0424:2740
4) BUS: 002 PORT: 001 ID 0424:2744
5) BUS: 003 PORT: 002 ID 0781:5583
6) BUS: 003 PORT: 003 ID 0bda:5401
7) BUS: 003 PORT: 004 ID 13d3:5652
Your choice:
I learned that libusb gives you a device list through libusb_get_device_list(). Each device has a descriptor with idVendor and idProduct. The user picks a number, and the program opens that device by VID/PID.
Once the device is open, the real work begins: finding the mass storage interface. The USB mass storage class is defined by three numbers:
if (inter_desc.bInterfaceClass == 0x08 // Mass Storage
&& inter_desc.bInterfaceSubClass == 0x06 // SCSI transparent
&& inter_desc.bInterfaceProtocol == 0x50)// Bulk-Only Transport
I had to walk through every configuration, every interface, every alternate setting, and every endpoint. The nested loops are four levels deep:
for each configuration:
for each interface:
for each alternate setting:
for each endpoint:
check if it's a bulk endpoint
check if it's IN or OUT
A flag_check byte tracks whether both IN and OUT bulk endpoints have been found. Once both are located, it records the interface number, alternate setting, and configuration value, then breaks out.
The USB spec defines endpoint directions using the high bit of the endpoint address: 0x80 means IN (device to host), 0x00 means OUT (host to device). Bulk endpoints have transfer type 0x02 in their bmAttributes.
Before anything else, I needed to make sure the device was alive. The SCSI TEST UNIT READY command (opcode 0x00) does exactly that — it returns success if the device can accept commands, or an error status if not.
Wrapping a SCSI command in a CBW means packing a 31-byte buffer:
unsigned char data_CBW[31] = {0};
data_CBW[0] = 0x55; // 'U'
data_CBW[1] = 0x53; // 'S'
data_CBW[2] = 0x42; // 'B'
data_CBW[3] = 0x43; // 'C' — signature "USBC"
data_CBW[4] = (tag >> 24) & 0xFF;
data_CBW[5] = (tag >> 16) & 0xFF;
data_CBW[6] = (tag >> 8) & 0xFF;
data_CBW[7] = (tag >> 0) & 0xFF;
data_CBW[8] = (data_len >> 0) & 0xFF;
data_CBW[9] = (data_len >> 8) & 0xFF;
data_CBW[10] = (data_len >> 16) & 0xFF;
data_CBW[11] = (data_len >> 24) & 0xFF;
data_CBW[12] = direction; // bit 7: 0 = host-to-dev, 1 = dev-to-host
data_CBW[14] = command_length; // 6 or 10 bytes
data_CBW[15] = scsi_opcode; // first byte of SCSI command
I got the signature wrong on my first attempt — wrote the first four bytes as 0x43 0x42 0x53 0x55 ("CBSU") instead of 0x55 0x53 0x42 0x43 ("USBC"). The hex values looked plausible, so it took me a while to notice. The device just never sent back a CSW with matching signatures, and I spent an hour checking everything else before realising the bytes were backwards.
For TEST UNIT READY, the command length is 6 bytes (put in byte 14), and the opcode byte (offset 15) is 0x00. The remaining 5 bytes are zero.
After sending the CBW on the OUT endpoint, I read the CSW back from the IN endpoint:
unsigned char data_CSW[13];
int t = libusb_bulk_transfer(devHandle, in_Endpoint, data_CSW, 13,
&amTransfered, 1000);
// Verify signature — must be "USBS"
if (data_CSW[0] != 0x55 || data_CSW[1] != 0x53 ||
data_CSW[2] != 0x42 || data_CSW[3] != 0x53) error
// Verify tag matches
if (data_CSW[4..7] != tag) error
// Check status
if (data_CSW[12] != 0x00) error // 0 = passed, 1 = failed, 2 = phase error
Getting "Ping Successful" on the first try was a satisfying moment. It meant the CBW/CSW handshake was correct — the device understood my packets.
The READ CAPACITY command (0x25) returns the last Logical Block Address and the block size. The response is just 8 bytes:
u_int64_t last_lba = (data_PL[0] << 24) | (data_PL[1] << 16) |
(data_PL[2] << 8) | (data_PL[3] << 0);
u_int64_t block_size = (data_PL[4] << 24) | (data_PL[5] << 16) |
(data_PL[6] << 8) | (data_PL[7] << 0);
double size_gb = (last_lba + 1) * block_size / (1024 * 1024 * 1024);
The data transfer direction bit is set to 1 (device-to-host). The data length in the CBW is 8 bytes. After sending the CBW, I read the 8-byte response from the IN endpoint, then read the CSW to confirm success.
One thing that caught me: after every command, I call reset_recovery() which sends a Bulk-Only Mass Storage Reset and clears both endpoint halts. If the previous command left the endpoints in a bad state, the next command would fail silently. This pattern — reset, send CBW, transfer data, read CSW — repeats for every command in the program.
void reset_recovery(libusb_device_handle *devHandle, int interface,
int in_Endpoint, int out_Endpoint) {
bulk_only_reset(devHandle, interface); // control transfer 0xFF
libusb_clear_halt(devHandle, in_Endpoint);
libusb_clear_halt(devHandle, out_Endpoint);
}
The READ 10 command (0x28) reads one or more 512-byte blocks from the device. The command byte layout:
data_CBW[15] = 0x28 // READ 10 opcode
data_CBW[16..19] = LBA // 4-byte logical block address
data_CBW[20..21] = count // number of blocks to read
data_CBW[23] = 0x01 // actually this should be in [20..21]
I set the LBA to 0 (first block), read 512 bytes, and check the boot signature at the end of the block:
if ((data_PL[0x1fe] != 0x55) || (data_PL[0x1ff] != 0xaa))
std::cout << "Partition Scheme: Unknown\n";
else if (data_PL[0x1c2] == 0xee)
std::cout << "Partition Scheme: GPT\n";
else
std::cout << "Partition Scheme: MBR\n";
Bytes 0x1FE–0x1FF are the MBR boot signature — always 0x55 0xAA for a valid MBR. Byte 0x1C2 is the partition type field in the first partition entry. If it's 0xEE, it's a protective MBR for GPT. Otherwise, it's regular MBR.
The direction bit in the CBW is inverted from what I expected. Bit 7 of byte 12: 1 means device-to-host (data IN), 0 means host-to-device (data OUT). I got this backwards on my first attempt and the device kept returning phase errors.
If a command fails midway, the endpoints get stuck. Without reset_recovery() between commands, every subsequent command fails with a timeout. The BOT spec calls this the "Bulk-Only Reset" — it's a standard control transfer that restores the device to a known state.
SCSI commands and responses use big-endian byte order. The tag, data length, LBA, and last logical block address are all big-endian. I had to shift bytes into the right positions manually. On a little-endian x86 machine, this means explicit byte manipulation:
data_CBW[4] = (tag >> 24) & 0xFF;
data_CBW[5] = (tag >> 16) & 0xFF;
data_CBW[6] = (tag >> 8) & 0xFF;
data_CBW[7] = (tag >> 0) & 0xFF;
I initially tried using READ CAPACITY 16 (0x9E) because it sounded more standard, but my device only responded to READ CAPACITY 10 (0x25). The older 10-byte variant is the safest bet for compatibility. Stick to the well-tested commands.
$ sudo ./psd
-------------------
STARTING
-------------------
Which of these device do you want to use? [1 - 7]
...
Your choice: 5
Ping Successful
DISK SIZE: 119 GB
Partition Scheme: MBR
Three commands, three CBW/CSW handshakes, and a 512-byte block read — that's all it takes to identify a disk and its partition scheme at the raw USB level.
The USB BOT spec is 65 pages. The SCSI command spec is a terse hex table. Both look cryptic at first, but once you understand the structure (CBW has a signature + tag + direction + command; CSW has a signature + tag + status), the rest falls into place. You don't need to memorise everything — just find the command you need and pack the bytes correctly.
The library handles USB device enumeration, descriptor parsing, control transfers, and bulk transfers. It's not particularly well-documented outside the API reference, but the concepts map directly to the USB spec. If you understand USB descriptors and transfer types, libusb is just a thin wrapper.
I did my testing on an old 16 GB thumb drive. The BOT protocol detaches the kernel driver, so the device disappears from the filesystem while the program runs. On my first few attempts, I had to unplug and replug the drive to reset it after the program crashed without releasing the interface.
Matching the CBW and CSW tags caught several bugs where I sent a command but read back the response from a previous command. The tag ensures you're looking at the right response. If the tags don't match, something went wrong with the synchronisation.
git clone https://github.com/ookaay/PartitionSchemeDetector.git
cd PartitionSchemeDetector
cmake -B build && cmake --build build
sudo ./psd
Pick your USB drive from the list. The program will ping it, read its capacity, and tell you its partition scheme. No mounting, no filesystem drivers, just raw SCSI over USB.