| BPF(4) | NetBSD Kernel Interfaces Manual | BPF(4) |
The packet filter appears as a character special device, /dev/bpf. After opening the device, the file descriptor must be bound to a specific network interface with the BIOSETIF ioctl. A given interface can be shared by multiple listeners, and the filter underlying each descriptor will see an identical packet stream.
Associated with each open instance of a bpf file is a user-settable packet filter. Whenever a packet is received by an interface, all file descriptors listening on that interface apply their filter. Each descriptor that accepts the packet receives its own copy.
Reads from these files return the next group of packets that have matched the filter. To improve performance, the buffer passed to read must be the same size as the buffers used internally by bpf. This size is returned by the BIOCGBLEN ioctl (see below), and under BSD, can be set with BIOCSBLEN. Note that an individual packet larger than this size is necessarily truncated.
The packet filter will support any link level protocol that has fixed length headers. Currently, only Ethernet, SLIP and PPP drivers have been modified to interact with bpf.
Since packet data is in network byte order, applications should use the byteorder(3) macros to extract multi-byte values.
A packet can be sent out on the network by writing to a bpf file descriptor. The writes are unbuffered, meaning only one packet can be processed per write. Currently, only writes to Ethernets and SLIP links are supported.
#include <sys/types.h> #include <sys/time.h> #include <sys/ioctl.h> #include <net/bpf.h>
Additionally, BIOCGETIF and BIOCSETIF require <net/if.h>.
The (third) argument to the ioctl(2) should be a pointer to the type indicated.
struct bpf_dltlist {
u_int bfl_len;
u_int *bfl_list;
};
The available type is returned to the array pointed to the bfl_list field while its length in u_int is supplied to the bfl_len field. ENOMEM is returned if there is not enough buffer. The bfl_len field is modified on return to indicate the actual length in u_int of the array returned. If bfl_list is NULL, the bfl_len field is returned to indicate the required length of an array in u_int.
The interface remains in promiscuous mode until all files listening promiscuously are closed.
struct bpf_stat {
uint64_t bs_recv;
uint64_t bs_drop;
uint64_t bs_capt;
uint64_t bs_padding[13];
};
The fields are:
struct bpf_program {
u_int bf_len;
struct bpf_insn *bf_insns;
};
The filter program is pointed to by the bf_insns field while its length in units of ‘struct bpf_insn' is given by the bf_len field. Also, the actions of BIOCFLUSH are performed.
See section FILTER MACHINE for an explanation of the filter language.
struct bpf_version {
u_short bv_major;
u_short bv_minor;
};
The current version numbers are given by BPF_MAJOR_VERSION and BPF_MINOR_VERSION from <net/bpf.h>. An incompatible filter may result in undefined behavior (most likely, an error returned by ioctl(2) or haphazard packet matching).
struct bpf_hdr {
struct bpf_timeval bh_tstamp;
uint32_t bh_caplen;
uint32_t bh_datalen;
uint16_t bh_hdrlen;
};
The fields, whose values are stored in host order, and are:
The bh_hdrlen field exists to account for padding between the header and the link level protocol. The purpose here is to guarantee proper alignment of the packet data structures, which is required on alignment sensitive architectures and improves performance on many other architectures. The packet filter ensures that the bpf_hdr and the network layer header will be word aligned. Suitable precautions must be taken when accessing the link layer protocol fields on alignment restricted machines. (This isn't a problem on an Ethernet, since the type field is a short falling on an even offset, and the addresses are probably accessed in a bytewise fashion).
Additionally, individual packets are padded so that each starts on a word boundary. This requires that an application has some knowledge of how to get from packet to packet. The macro BPF_WORDALIGN is defined in <net/bpf.h> to facilitate this process. It rounds up its argument to the nearest word aligned value (where a word is BPF_ALIGNMENT bytes wide).
For example, if ‘p' points to the start of a packet, this expression will advance it to the next packet:
p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
For the alignment mechanisms to work properly, the buffer passed to read(2) must itself be word aligned. malloc(3) will always return an aligned buffer.
The following structure defines the instruction format:
struct bpf_insn {
uint16_t code;
u_char jt;
u_char jf;
int32_t k;
};
The k field is used in different ways by different instructions, and the jt and jf fields are used as offsets by the branch instructions. The opcodes are encoded in a semi-hierarchical fashion. There are eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX, BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC. Various other mode and operator bits are or'd into the class to give the actual instructions. The classes and modes are defined in <net/bpf.h>.
Below are the semantics for each defined BPF instruction. We use the convention that A is the accumulator, X is the index register, P[] packet data, and M[] scratch memory store. P[i:n] gives the data at byte offset “i” in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or unsigned byte (n=1). M[i] gives the i'th word in the scratch memory store, which is only addressed in word units. The memory store is indexed from 0 to BPF_MEMWORDS-1. k, jt, and jf are the corresponding fields in the instruction definition. “len” refers to the length of the packet.
| BPF_LD+BPF_W+BPF_ABS | A <- P[k:4] |
| BPF_LD+BPF_H+BPF_ABS | A <- P[k:2] |
| BPF_LD+BPF_B+BPF_ABS | A <- P[k:1] |
| BPF_LD+BPF_W+BPF_IND | A <- P[X+k:4] |
| BPF_LD+BPF_H+BPF_IND | A <- P[X+k:2] |
| BPF_LD+BPF_B+BPF_IND | A <- P[X+k:1] |
| BPF_LD+BPF_W+BPF_LEN | A <- len |
| BPF_LD+BPF_IMM | A <- k |
| BPF_LD+BPF_MEM | A <- M[k] |
| BPF_LDX+BPF_W+BPF_IMM | X <- k |
| BPF_LDX+BPF_W+BPF_MEM | X <- M[k] |
| BPF_LDX+BPF_W+BPF_LEN | X <- len |
| BPF_LDX+BPF_B+BPF_MSH | X <- 4*(P[k:1]&0xf) |
| BPF_ST | M[k] <- A |
| BPF_STX | M[k] <- X |
| BPF_ALU+BPF_ADD+BPF_K | A <- A + k |
| BPF_ALU+BPF_SUB+BPF_K | A <- A - k |
| BPF_ALU+BPF_MUL+BPF_K | A <- A * k |
| BPF_ALU+BPF_DIV+BPF_K | A <- A / k |
| BPF_ALU+BPF_AND+BPF_K | A <- A & k |
| BPF_ALU+BPF_OR+BPF_K | A <- A | k |
| BPF_ALU+BPF_LSH+BPF_K | A <- A << k |
| BPF_ALU+BPF_RSH+BPF_K | A <- A >> k |
| BPF_ALU+BPF_ADD+BPF_X | A <- A + X |
| BPF_ALU+BPF_SUB+BPF_X | A <- A - X |
| BPF_ALU+BPF_MUL+BPF_X | A <- A * X |
| BPF_ALU+BPF_DIV+BPF_X | A <- A / X |
| BPF_ALU+BPF_AND+BPF_X | A <- A & X |
| BPF_ALU+BPF_OR+BPF_X | A <- A | X |
| BPF_ALU+BPF_LSH+BPF_X | A <- A << X |
| BPF_ALU+BPF_RSH+BPF_X | A <- A >> X |
| BPF_ALU+BPF_NEG | A <- -A |
| BPF_JMP+BPF_JA | pc += k |
| BPF_JMP+BPF_JGT+BPF_K | pc += (A > k) ? jt : jf |
| BPF_JMP+BPF_JGE+BPF_K | pc += (A ≥ k) ? jt : jf |
| BPF_JMP+BPF_JEQ+BPF_K | pc += (A == k) ? jt : jf |
| BPF_JMP+BPF_JSET+BPF_K | pc += (A & k) ? jt : jf |
| BPF_JMP+BPF_JGT+BPF_X | pc += (A > X) ? jt : jf |
| BPF_JMP+BPF_JGE+BPF_X | pc += (A ≥ X) ? jt : jf |
| BPF_JMP+BPF_JEQ+BPF_X | pc += (A == X) ? jt : jf |
| BPF_JMP+BPF_JSET+BPF_X | pc += (A & X) ? jt : jf |
| BPF_RET+BPF_A | accept A bytes |
| BPF_RET+BPF_K | accept k bytes |
| BPF_MISC+BPF_TAX | X <- A |
| BPF_MISC+BPF_TXA | A <- X |
The BPF interface provides the following macros to facilitate array initializers:
BPF_STMT (opcode, operand) BPF_JUMP (opcode, operand, true_offset, false_offset)
struct bpf_insn insns[] = {
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
sizeof(struct ether_header)),
BPF_STMT(BPF_RET+BPF_K, 0),
};
This filter accepts only IP packets between host 128.3.112.15 and 128.3.112.35.
struct bpf_insn insns[] = {
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
BPF_STMT(BPF_RET+BPF_K, 0),
};
Finally, this filter returns only TCP finger packets. We must parse the IP header to reach the TCP header. The BPF_JSET instruction checks that the IP fragment offset is 0 so we are sure that we have a TCP header.
struct bpf_insn insns[] = {
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
BPF_STMT(BPF_RET+BPF_K, 0),
};
S. McCanne and V. Jacobson, The BSD Packet Filter: A New Architecture for User-level Packet Capture, Proceedings of the 1993 Winter USENIX, Technical Conference, San Diego, CA.
A file that does not request promiscuous mode may receive promiscuously received packets as a side effect of another file requesting this mode on the same hardware interface. This could be fixed in the kernel with additional processing overhead. However, we favor the model where all files must assume that the interface is promiscuous, and if so desired, must use a filter to reject foreign packets.
Data link protocols with variable length headers are not currently supported.
Under SunOS, if a BPF application reads more than 2^31 bytes of data, read will fail in EINVAL. You can either fix the bug in SunOS, or lseek to 0 when read fails for this reason.
“Immediate mode” and the “read timeout” are misguided features. This functionality can be emulated with non-blocking mode and select(2).
| June 8, 2010 | NetBSD 5.99 |