Basic Bit Manipulation
Transacting on Ethereum is extremely expensive, so sometimes, it makes sense to aim for more granular control over your code. One way to achieve this is to utilize bitwise operations, like in the bit twiddling techniques I've posted on X @fiveoutofnine:
- quick/small mapping
- bitpacking consecutive data to reduce storage costs
if
blocks with 3 bit operations- basic operations
- loop through some numbers without an array +extra
These operations/tricks are pretty different from other types of programming, so it may be fairly esoteric. Although I tried to give an application example with each post, it's still hard to see where/how to apply them yourself. A good example of a project with more extensive applications is my on-chain chess engine, but the contracts are probably a bit hard to follow for beginners.
In this post, I attempt to explain some of these tricks and explain my thought process when applying them with a much simpler game: Tic-tac-toe.
Bitwise operators
Feel free to skip this section if you are already familiar with bitwise operators.
First, here are the basic bitwise operators used in the following Tic-tac-toe example: >>
, <<
, &
, |
, and ^
.
>>
(SHR)
Digits are moved, or shifted, to the right. Zeros replace the discarded bits.
Shifting an integer right by bits results in .
<<
(SHL)
Digits are moved, or shifted, to the left. Zeros replace the discarded bits.
Shifting an integer left by bits results in .
&
(AND)
For each pair of digits, compute 1 if both are 1; otherwise, compute 0.
Since &
only "accepts" the bit if both numbers' bits are 1
, you can construct a bitmask to select which bits of a number you want to read. For example, for some number , if we want to read every other bit, we can compute m & 1010...1010
.
|
(OR)
For each pair of digits, compute 1 if either is 1; otherwise, compute 0.
Since |
"accepts" the bit as long as either numbers' bits are 1
, you can write a number's bits into another number, as long as the first number's bits in the same position are all 0
.
^
(XOR)
For each pair of digits, compute 1 if they differ. Otherwise, compute 0.
^
can be helpful in conjunction with &
and |
when overwriting a number's bits with another set of bits.
Try working through the examples and notes to gain some foundational intuition.
Numbers in this section with only 0s and 1s are in binary representation unless otherwise specified.
Overview of Tic-tac-toe
Before writing the contract, let's lay out the basic things required for a game of Tic-tac-toe. It's a 2-player game, and each player only has 1 type of move they can play (i.e. placing their respective symbol into an unoccupied square). When a move is inputted by a player, we have to validate a few things:
- The player is 1 of the 2 players in the game.
- It's the player's turn to play a move.
Next, we have to check that the move is legal. If it is, we apply the move.
Finally, we must check if the game has ended (i.e. either player has formed 3 consecutive dots).
Writing this out as pseudocode in a function called playMove
, we have:
Board/game state representation
Now that we have an idea of all the information to store, let's come up with a representation in code. Keep in mind that the design should allow for efficient computation: the code will read the data and compute with it, so operations such as accessing, decoding, and storing should be cheap. Once we achieve both, since the game has to be stored fully on-chain and storage costs are (generally) the most expensive operation on the EVM, we can look towards using as few bits as possible.
Base representation
A naive approach might look something like:
If we were satisfied with this, we would make sure struct Game
was tightly packed, then proceed.
However, uint8[9] board
uses a lot of unnecessary reads/writes, and we don't need 8 bits to store data for a single position on the board.
Since there are only 3 valid states, 2 bits are enough ().
Next, bool turn
can easily be 1 bit: 0 if false, 1 if true.
Lastly, because bool hasGameEnded
, bool playerZeroWon
, and bool playerOneWon
work in conjunction to express the state of the game (only 4 possibilities), we can combine them into 2 bits:
- the game is ongoing (
00
), - player zero has won (
10
), - player one has won (
11
), - and game is a draw (
01
).
Altogether, we use just 21 bits (down from when using the struct above) to represent everything:
Bitpacking games into uint256
s instead of using structs
If we were to use a struct to store the game states, we'd have to use uint24
because it's the smallest unit that fits 21 bits, which would waste 3 bits.
Instead, let's store the players' addresses in 2 other mappings and have a third mapping exclusively to store game states:
This allows us to bitpack consecutive games' data together and store it slightly cheaper.
The idea is to reduce as many zero slot writes as possible because writing to zero slots costs 20k gas, whereas writing to nonzero slots only costs 5k gas.
We achieve this by bitpacking as many games' data into a uint256
(in this case, , so we can fit up to 12 games in 1 uint256
).
This way, only every game will cost 20k gas to store, and every other game will only cost 5k gas.
Reading and writing remain efficient:
Final representation
Now, we have 3 mappings to keep track of all games' data and players. With this set-up, storing each new game to storage costs roughly
on average (2 zero slot writes for each of the players' addresses, and gas for the game's state).
Since addresses are only 160 bits, we can remove a write to storage by bitpacking the game's state with one of the players' addresses. Let's arbitrarily pick player zero's address for this. Combined, we have:
Again, reading and writing remain efficient:
In this particular case, since we need a minimum of 2 storage slots per new game anyway (2 player addresses already exceed 2 32-byte words), cutting down on the number of bits used to represent the game is not necessary—all else equal, using 96 bits () is just as efficient. However, I thought I'd describe the process here anyway, for when it might be applicable.
Fine-tuning the representation
For Tic-tac-toe, the representation above is good enough. But if it were a different application, there might be extra considerations to take. These are highly dependent on the situation, so here are some common patterns:
Frequently accessed data
The ordering of the first/last bits might help save on computation: they only require one of >>
or &
to access.
Thus, if some information needs to be accessed more frequently than the others, it'd probably be wise to place it at the start/end.
Structuring data to remove expensive computations
The main example here is more precisely described as a "branchless optimization," but the more general pattern of using the value itself to compute the result can be quite useful for other situations too.
An expensive computation on the EVM (and most computers) are branch instructions.
Keeping this in mind, it may be wise to structure your representation to remove them.
For example, suppose a 3-bit value portion
maps to 8 different outputs as follows:
Instead, we can use portion
as a value itself as part of the computation!
The following code block is equivalent to the series of if/else statements above:
For a real-world example, see this post.
On a related note, you can often structure your representation and assign values in a way that allows for efficient group conditional checks (i.e. form it in a way that makes it easy to categorize). Some simple examples:
- if the value is even, perform
x
, otherwise performy
; - if the value is greater than m, perform
x
, otherwise performy
; - if the value is a multiple of 8, perform
x
, otherwise performy
; - if the value's 2nd to 5th bits are
010
, performx
, otherwise performy
.
You can also combine one or more of these to form composite checks.
For example, if the value is even and greater than m
, perform x
, otherwise perform y
.
For heavy usage of this concept in a live deployment, see generateMoves
from Chess.sol
.
Pedantic bit packing: is 21 bits good enough?
You may have realized that we use 2 bits ( total possibilities) per position for the Tic-tac-toe example when there are only 3 possible states (empty, occupied by player 0, or occupied by player 1). Why not save 3 more bits ( is 15 bits long) by using a base 3 representation for the board instead? We can then use just 18 bits ()—isn't this better?
Not really.
Besides the point that we bitpack the game data with player 0's address, consider the following: 18 bits allows us to store 14 () games per uint256
, which means it costs gas on average to store a game's data.
Comparatively, 21 bits allows us to store 12 games per uint256
, or gas on average.
Although 449 gas is "saved" on the storage side, decoding and encoding base 3 is considerably more expensive with many division and mod operations (vs almost entirely using bit operations)1.
Valid reasons (maybe) to be that pedantic about the number of bits is that (1) it results in major gas savings, or (2) computation complexity for encoding/decoding is not an issue (e.g. it's only encoded/decoded in a view function). On-chain art is a category of projects/communities that are open to excessive efforts to compress storage, as well as a realm where it often makes sense to.
1 We also later end up using the extra slot to make a slight optimization in checking whether a player won. See subsection Checking if a player won.
User validation
Recall that our player 0 is stored in mapping(uint256 => uint256) public playerZerosAndGames
, where the first 160 bits represent player 0, and player 1 is stored in mapping(uint256 => address) public playerOnes
.
Thus, to check that msg.sender
is either player 0 or player 1, we have:
Next, to check whether it's msg.sender
's turn to play, recall that the last bit of the board denotes whose turn it is to play: 0 if it's player 0's turn, and 1 if it's player 1's turn.
In bit operations: if board & 1 == 0
, it is player 0's turn, and if board & 1 == 1
, it is player 1's turn.
Move validation
After we have verified that msg.sender
is the correct address to play, we must verify whether the move they inputted is valid.
There are 2 necessary checks for this:
- the game is not over,
- and the position they inputted must be empty on the board.
Recall that if the 3rd to 2nd last bits are 00
, the game is still ongoing.
Therefore, checking whether the game is not over is as simple as reading those 2 bits and checking if they equal 0:
Next, let _index
denote the user's input, where they correspond to the following positions of the board:
Then, accessing the position corresponding to _index
is just a matter of shifting to the correct position and, once again, masking 2 bits with 3 (0b11
).
This value must equal 0 (i.e. the square is empty) for the move to be valid:
Check game ending state
Intuition
There are 4 possible ways for a player to win:
- form a horizontal line of 3,
- form a vertical line of 3,
- form a diagonal line of 3 from the bottom right corner to the top left corner,
- and form a diagonal line of 3 from the bottom left corner to the top right corner.
If either player has satisfied one of those, the game is over. How do we determine that?
Well, for each winning pattern, we can select the bits that correspond to the squares in that pattern. Then, if these bits all represent squares played entirely by player 0 or entirely by player 1, the game is over.
Winning pattern bitmasks
Recall from earlier that we can select bits with the &
operator and a mask.
Let's denote
HORIZONTAL
for rows of 3,VERTICAL
for columns of 3,BR_TO_TL_DIAGONAL
for the diagonal from the bottom right corner of the board to the top left corner,- and
BL_TO_TR_DIAGONAL
for the diagonal from the bottom left corner of the board to the top right corner.
The following code block shows how they're computed (remember each square is represented by 2 bits):
Applying the bitmasks
The position we apply the mask on is also important. For example, if we apply the horizontal bit mask at the square of index 0
, we (correctly) select the following squares marked by x
:
Now, suppose we apply it at the square of index 1
:
This is incorrect. The correct positions to apply them for each mask are shown below:
Bitmask | Applied at | Selected bits |
---|---|---|
HORIZONTAL | ||
VERTICAL | ||
BR_TO_TL_DIAGONAL | ||
BL_TO_TR_DIAGONAL |
Try working through them on paper!
Checking if a player won
For each selected square, the corresponding pair of bits in the mask are 0b11
, or 3 in decimal representation.
All other bits (i.e. those that aren't included by the mask) are 0
.
Then, if the selected portion denotes squares entirely played by either player, it should be a scalar multiple of the mask that was used:
- If
0b01
(1 in decimal) represents a square played by player 0, and all the selected squares were played by player 0, the selected portion should equal . - Similarly, if
0b10
(2 in decimal) represents a square played by player 1, and all the selected squares were played by player 1, the selected portion should equal .
This is fairly efficient to check, but we can do slightly better.
If we use 0b00
for empty squares, 0b01
for player 0, and 0b10
for player 1, 0b11
represents nothing.
What happens if we use 0b11
instead of 0b10
for player 1's moves?
The selected portion should equal , instead of !
This is more efficient to check, so let's go with this.
Implementation
x
denotes the position the code is at, and -
denotes bits that have been bitshifted off (and thus no longer relevant).
Check game ending state w/ while loop (bonus)
This section is just for fun (and has some cool bit twiddling patterns).
The code above looks kinda ugly/redundant.
Let's use a while
loop instead, so we don't have checkGameStateHelper
so many times.
We can do this by bitpacking the win-pattern-checking masks consecutively into a uint256 masks
(in the order they are used), as well as bitpacking the number of bits the board is shifted by after each check into a uint256 boardShifts
.
Then, starting at index 0, we can iterate through the board masks and apply the board shifts after each check to make our way through the entire board as follows:
Board | Bitmask | Check | Shift |
---|---|---|---|
HORIZONTAL | 0 | ||
VERTICAL | 0 | ||
BR_TO_TL_DIAGONAL | 2 | ||
VERTICAL | 2 | ||
VERTICAL | 0 | ||
BL_TO_TR_DIAGONAL | 2 | ||
HORIZONTAL | 6 | ||
HORIZONTAL | 0 |
Computing masks
and boardShifts
Since the largest mask is 18 bits, and , we can safely reserve 18 bits for each "word" within uint256 masks
:
Next, since the largest shift is 6, we can safely use 3 bits (0b111 > 6
) for each "word" within uint256 boardShifts
:
Implementation
Conclusion
So many people see low-level solidity optimizations from people like @transmissions11 and think that it's the key to optimizing their project
but what they don't realize is that the real savings come from efficient and well-designed mechanisms and architectures
It's easy to get caught up in tactical tricks to make near-obsolete optimizations to your smart contracts. Most of the techniques I shared on X were never even intended to be "optimization tips," (bit twiddling can be quite useful outside of optimization), but I sort of ended up falling down that hole.
Of course, it's always fun to discover and share tricks like this! But many of them (especially bit twiddling) have extremely niche applications, so ask yourself whether it really makes sense to obfuscate your code for small optimizations before applying them.
Extra reading/resources
If you want to see some more advanced stuff, check out:
- Lists
- Sean Anderson's Bit Twiddling Hacks
- The Aggregate Magic Algorithms
- HAKHEM
- Assembly Language Lab
- Books
- Hacker's Delight 2nd Edition (Amazon, IPFS URI)
- Chapter 1: "Bit Wizardry" of "Matters Computational"
- Other
fiveoutofnine/on-chain-chess
- Fast Inverse Square Root — A Quake III Algorithm (a lot of people's first exposure to bit twiddling)
- Reciprocal Multiplication, a tutorial
If you found this helpful, consider following me on X @fiveoutofnine.