Introduction

In a fascinating intersection of technology and art, I developed a physical NFT minting device using a Generative Adversarial Network (GAN) deployed on a Raspberry Pi 4. The project involved training a 128×128 Deep Convolutional GAN (DCGAN) on my MacBook M3 and running the system headlessly, controlled via a LILYGO TTGO T-Display ESP32.

GAN Architecture and Training

The GAN architecture consists of a 6-block generator that processes latent space into progressively larger feature maps, reaching a final output of 128×128 pixels. The generator's pathway is as follows: latent → 4×4 → 8×8 → 16×16 → 32×32 → 64×64 → 128×128, with feature maps starting at f×16=1024. A corresponding 6-block discriminator accompanies this generator.

I trained the model over 800 epochs using Apple Silicon MPS, which took approximately 4 hours. The training dataset comprised 2480 images from 11 subjects, with a primary anchor class containing 2000 images and several minority classes to facilitate the generation of hybrid outputs. The model was then exported from PyTorch to ONNX format, resulting in a 53MB file, and the inference time on the Raspberry Pi 4 is about 3 seconds per generated face.

Device Functionality and Experience

Once the face is generated, it is sent to the ESP32, where a title for the NFT is created using a pre-defined dictionary and a template sentence that reads: "This is a <adjective> NFT and I want to <verb> it." The device was designed not just for functionality but also as an art piece, which I showcased in the streets of New York City, allowing passersby to interact with it and generate their own NFTs.

For those interested in the technical aspects, I am open to discussing the training pipeline, the ONNX conversion process, or any other queries related to the project. Check out the full video of the experience here.