Emma Deng

SketchRNN: Learning from My Hand

Year: 2024

Tools: SketchRNN, TensorFlow, p5.js, Python

An experimentation around visual abstraction & Neural networks trained on personal handwriting and sketches

This project uses a recurrent neural network, SketchRNN, to trains on a personal dataset of handwriting and sketches collected using a custom built p5.js tool. The generated outputs of asemic writing and ambiguous sketches linger at the threshold of recognition and resists definitive interpretation. By working with small, idiosyncratic data, this project explores visual abstraction through machine collaboration.

Jump to Process ↓

On Visual Abstraction and Ambiguity

Children's Art

Children draw what they know rather than what they see. A face can be a circle with two dots and a line, yet still immediately recognizable. Developmental psychologists call this intellectual realism, which describes how children prioritize salient features over photographic accuracy.

These marks provide just enough information for the our brain to recognize and fill in the rest.

Asemic Writing

Asemic Writing preserves the visual rhythm and structure of text without carrying semantic content.

What do we recognize when we look? Are we responding to meaning, or to the visual patterns that suggest meaning?

e.g. Xu Bing's 'Book from the Sky' features thousands of invented Chinese characters that look authentic but are not readable.

Machines and Visual Abstraction

Our brains are constantly searching for meaning, making sense of the world around us. We see faces in trees, hear words in noise, and find patterns in randomness (pareidolia). When presented with ambiguous information, we actively work to resolve it into something familiar.

Computer vision systems struggle with abstracted imagery. Most models are trained on high-fidelity photographs and rely on texture and details that sketches omit. Recent research (SEVA) explores whether machines can match human performance in recognizing and generating sketches.

SketchRNN - Rather than trying to recognize a sketch, it learns the sequential patterns of making them. The model can generate sketch-like outputs without conceptual understanding, mirroring asemic writing and children’s drawings in communicating through visual pattern rather than explicit meaning.

Process

Updating SketchRNN

The original codebase (google magenta) was built for TensorFlow 1.x and had become incompatible with current tools and dependencies. I adapted the code to run in TensorFlow 2.x with v1 compatibility mode, resolving deprecated functions, updating data pipelines, and ensuring the training loop worked with modern Python environments with the help of LLMs. This updated implementation is available on GitHub.

Building the Dataset

I built a custom p5.js sketch to capture my handwriting and drawings, recording stroke sequences as SVG data. The tool limited stroke complexity for training efficiency.

I generated:

~ 1001 handwritten English words
~ 218 quick sketches of everyday objects

Handwritten english words

Quick sketches

Training Two Models

I trained two separate models:

Handwriting model
Sketch model

SketchRNN uses a sequence-to-sequence variational autoencoder. It learns probability distributions of strokes and generates new samples from that learned latent space.

Selective Outputs:

Results

To be continued!

I combined outputs from both models and plotted on paper with an AxiDraw Pen plotter.

This layout references the workbooks we used in early elementary school. Each page had a box at the top for drawing and ruled lines below for writing. In that format, kids are expected to produce drawings and writings that inform each other. Here, both elements resist clear interpretation yet evoke something familiar.