One Million Emojis, Only 2 MB!

Recently, while watching one of my favorite YouTubers, ThePrimeagen, I came across a video featuring onemillioncheckboxes.com. Although simple, I found the concept incredibly cool—just a bunch of people checking a bunch of boxes.

I decided to scratch my itch for building another side project. I wanted to create something fun with a similar collaborative style using WebSockets. After long and deep thought (about 5 minutes), I came up with what I consider a genius idea: What if I did something similar, but with one million emojis?

One Million Emojis

onemillionemojis.com

The Idea

The website is simple, there is a 1000x1000 square grid of emoji inputs. After choosing an emoji, it's locked in and can't be changed. The grid updates in realtime for anyone connected to the site.

The Technical Details

When dealing with a million things, efficiency is important, because well, a million things is a lot of things. I followed the same approach as the creator of One Million Checkboxes, using a bitfield in Redis. Bitfields let you set, increment, and get integer values of arbitrary bit length, making it perfect for this project.

First, I needed a comprehensive list of emojis. After some online searching, I found this list. I wrote a Node.js script using Cheerio to parse it and store it as a JSON file:

// Parses this list of emojis: https://www.prosettings.com/emoji-list/
const fs = require('fs');
const cheerio = require('cheerio');

function processTags(ename) {
  const ignoredWords = ['the', 'with', 'on', 'over', 'and', 'or', 'in'];
  const tagsSet = new Set(
    ename.split(' ')
      .map(word => word.replace(/[^a-zA-Z]/g, '').toLowerCase())
      .filter(word => !ignoredWords.includes(word) && word.length > 0)
  );
  return Array.from(tagsSet);
}

function parseHTMLTable(html) {
  const $ = cheerio.load(html);
  const results = [];
  $('table tr').each((index, element) => {
    const $row = $(element);
    const $echars = $row.find('td.echars');
    const $ename = $row.find('td.ename');
    const $eno = $row.find('td.eno');
    if ($echars.length && $ename.length && $eno.length) {
      results.push({
        id: $eno.text().trim(),
        emoji: $echars.text().trim(),
        tags: processTags($ename.text().trim())
      });
    }
  });
  return results;
}

function main() {
  const inputFile = 'source.html';
  const outputFile = 'emojis.json';
  const htmlContent = fs.readFileSync(inputFile, 'utf8');
  const parsedData = parseHTMLTable(htmlContent);
  fs.writeFileSync(outputFile, JSON.stringify(parsedData, null, 2));
  console.log(`Parsed data has been written to ${outputFile}`);
}
main();

This script provided me with a list of nearly 2,000 emojis, including additional metadata for search functionality.

After determining the number of emojis to support, I considered how to store this information efficiently. The checkboxes site uses a bitfield where each individual bit represents a checkbox. However, this approach wouldn't work for our case, as we have far more than just two states per box.

Instead, I decided to use 16 bits to represent each emoji. With 2^16 = 65,536, we have more than enough space to represent our ~2,000 emojis.

You might think that 16 bits is overkill, and you'd be right. I initially attempted to use just 11 bits (2^11 = 2,048, which fits nicely). Unfortunately, the code I wrote to encode and decode this wasn't playing nice, so for the sake of shipping the project quickly (I started and finished this in one evening), I decided to use 16 bits.

Storing the grid in Redis requires 16 million bits:

16,000,000 bits * 1/8 bytes/bit * 1/1000 KB/byte * 1/1000 MB/KB = 2 MB

That's only 2 MB, about the size of a high-resolution JPEG image or a short audio clip.

With the technical details figured out, I was ready to start coding. I chose to implement this in Go, partly because I'll be starting a job at Google soon where I'll be using Go extensively (more on that in a future blog post 😉).

After watching some Go tutorials on YouTube and with some assistance from Claude 3.5 Sonnet, I managed to get the grid backend working:

func updateGrid(ctx context.Context, row, col, value int) error {
	if row < 0 || row >= gridSize || col < 0 || col >= gridSize {
		return fmt.Errorf("invalid row or column")
	}
	if value < 0 || value > emojiListSize {
		return fmt.Errorf("value out of range for emoji list")
	}
	currValue, _ := getValueAt(ctx, row, col)
	if currValue != 0 {
		return fmt.Errorf("value already set")
	}
	index := (row*gridSize + col) * 16
	_, err := rdb.BitField(ctx, redisKey, "SET", "u16", index, value).Result()
	return err
}

func getValueAt(ctx context.Context, row, col int) (int, error) {
	if row < 0 || row >= gridSize || col < 0 || col >= gridSize {
		return 0, fmt.Errorf("invalid row or column")
	}
	index := (row*gridSize + col) * 16
	result, err := rdb.BitField(ctx, redisKey, "GET", "u16", index).Result()
	if err != nil {
		return 0, err
	}
	if len(result) == 0 {
		return 0, fmt.Errorf("no value returned")
	}
	return int(result[0]), nil
}

The front end was built with React, Tailwind, and shadcn. The most important optimization is that the DOM doesn't load all one million emoji inputs at once. I used react-window to virtualize the grid, keeping only a certain number of inputs loaded on the page based on the screen size. You'll notice that if you open the site on larger screens or zoom out significantly, more inputs load, which can eventually cause your browser to slow down.

The front end communicates with the Go server using WebSockets for real-time communication. Having experience with WebSockets from other projects, I only needed to figure out the Go-specific details. I chose to use gorilla for this purpose.

The Good

Launching this project with an entirely empty grid was awesome! I could see people joining one by one, adding their contributions to the site. Some were small, while others were much larger. Here are a couple I really liked:

Messi banner (the GOAT), shoutout to Alejandro for creating this:

Screenshot of Messi banner

Penguin and the "end" sign (at the bottom right of the page). Shoutout to Andrew for the awesome penguin:

Screenshot of penguin and end sign

The Bad

Naturally, when you open something to the internet with total anonymity, you should expect the worst. There was no shortage of that here. I initially planned not to censor anything, but I knew I would have to eventually. I've tried to be as lenient as possible, but I'm also not trying to have my project be a place to showcase hate. As a result, I've removed a number of emojis and replaced them with hearts.

It's okay, though! Honestly, I expected worse. Hopefully, I won't have to keep playing moderator.

The End

I haven't had this much fun working on a project in a long time. It was a refreshing experience, and I look forward to creating more projects like this one.

Check out One Million Emojis! We're still very far from hitting the original goal of all one million emojis! 😂

P.S. Huge shoutout to eieio for making the original One Million Checkboxes and for inspiring this project.