Handwriting LaTeX

Table of Contents

Disclaimer: The methods below are a work in progress, mileage may vary.

Introduction
#

This article is intended for those who need to typeset maths and other technical subjects. If you just need to digitize handwriting, this isn’t for you, but you’re welcome to read anyway! A few things to get out in the open first:

LaTeX was meant to be typed, not handwritten. Delimitations for environments would need to be added in in the editing phase.
Handwriting OCR (optical character recognition) is a monumentally difficult problem in and of itself.
LaTeX, being inherently WYSIWYM, is difficult to translate from “compiled” back into source. Even the more so when the “compiled” pages aren’t actual LaTeX.

Workflow
#

Here are the needed components:

graph LR; A(Write) B(OCR) C(Editing) D(Compile) A --> B --> C --> D

And here is how I fill each:

graph LR; a(Pen, Graph paper) b(MathPix) c(vim + vimtex) d(latexmk) a --> b --> c --> d

Write
#

Some considerations:

Graph paper with darker lines can, and will, interfere with OCR.
Pen performs better than pencil, even compared to darker leads such as 2B.

One could use a digital tablet, but the whole point for me is to go analog.

OCR
#

The only program to perform decent OCR on LaTeX, as of the time of writing, is MathPix, a paid, proprietary subscription service. Programs exist that can handle single expressions, such as pix2tex, but they can’t process entire PDFs in one shot.

Scanning via camera will be faster than dedicated scanner, but yields worse result. I scan directly into the MathPixs Snips app, and export it from the website as LaTeX.

MathPix also provides a terminal interface, mpx-cli for use with an account or API, but as of the time of writing, it hasn’t been updated for years, and doesn’t seem to be maintained.

Editing
#

I use vim + vimtex as my primary LaTeX editor, with live PDF preview in zathura. My personal setup is available here. I strongly recommend kile by KDE for anyone who finds vim too daunting.

Both latexindent and ltex-ls are helpful for post processing. latexindent is a formatting tool for your source code, and doesn’t affect the compiled PDF. ltex-ls is a LSP that provides grammar and usage checks in the editor, powered by LanguageTool.

Compile
#

I use latexmk, a make tool for LaTeX, that reduces the number of runs needed to generate the finished PDF. For simple documents pdflatex will suffice, but latexmk will handle documents with table of contents and bibliography that would normally need multiple runs.

Automation
#

MathPix exports LaTeX in .zip, even if just a single .tex file.

By assuming the latest .zip in the download folder is the one we want, we can automate the workflow quite a bit. The following script appends everything between \begin{document} and \end{document} from the source to a *.tex file of our choosing in TARGET_DIR.

Long bit of code ahead, click here to skip to next section.

#!/bin/bash

TARGET_DIR="$HOME/Documents"
DOWNLOADS_DIR="$HOME/Downloads"


echo "Please select a .tex file to append to:"
select file in $TARGET_DIR/*.tex; do
    if [ -n "$file" ]; then
        echo "You selected: $(basename $file)"
        TARGET_FILE="$file"
        break
    else
        echo "Invalid selection. Please try again."
    fi
done

MOST_RECENT_ZIP=$(find "$DOWNLOADS_DIR" -type f -name '*.zip' -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d" ")

if [ -z "$MOST_RECENT_ZIP" ]; then
    echo "No .zip files found in the downloads directory."
    exit 1
fi

TEX_FILES=$(unzip -l "$MOST_RECENT_ZIP" | grep -E '\.tex$' | awk '{print $4}')

if [ -z "$TEX_FILES" ]; then
    echo "No .tex files found in the most recent .zip file."
    exit 1
fi

TEX_FILE=$(echo "$TEX_FILES" | head -n 1)

# Extract the content between \begin{document} and \end{document> from the .tex file, excluding these lines
CONTENT=$(unzip -p "$MOST_RECENT_ZIP" "$TEX_FILE" | sed -n '/\\begin{document}/, /\\end{document}/ { /\\begin{document}/!{/\\end{document}/!{/\\includegraphics/!p}}}')

read -p "Append $(basename "$MOST_RECENT_ZIP"), created on $(date -d @$(stat -c %W "$MOST_RECENT_ZIP")), to $(basename "$TARGET_FILE")? (Y/n) " choice
case "$choice" in
  ""|y|Y )
    sed -i '0,/^\\end{document}$/s///' $TARGET_FILE
    echo "$CONTENT" >> "$TARGET_FILE"
    echo "Content appended to $TARGET_FILE.ick
    echo "\end{document}" >> $TARGET_FILE
    ;;
  n|N )
    echo "Canceled"
    ;;
  * )
    echo "Invalid choice. Operation canceled."
    ;;
esac

The above script is available here. For ease of use, place the above in ~/.local/bin (or wherever you want), and add an alias to your shell:

alias ='sh ~/.local/bin/textractor.sh'

When MathPix can’t recognize something, it inserts the undecipherable rectangle as an image, and includes said image in the .zip. Since I’m going to be cross checking the MathPix rendered LaTeX against my original paper copy anyway, the script excludes all images.

Note this won’t add any \usepackage{} MathPix decided to include, since those are above \beging{document}. To properly compile, the target file needs to be minimally set up.

Optimization
#

Since I’m using a parsing script in any case, I could make use of a shorthand or expansions:

\eq → \begin{equation}

But, this will interfere with the environments generated by MathPix. This still could be useful for other formatting and metadata, however I have yet to find a use for it.

Cost-Benefit Analysis
#

Handwriting LaTeX vs handwriting or typing LaTeX.

Benefits	Detriments
Better retention	OCR introduces errors
Faster for real-time note-taking	Increased time editing
Finished product is LaTeX PDF

Reddit thread on this topic found here.

Introduction#

Workflow#

Write#

OCR#

Editing#

Compile#

Automation#

Optimization#

Cost-Benefit Analysis#