Backpropagation for surface fitting

This demo trains a shallow neural network on noisy samples from a 2D surface. The full computational graph is too large to draw usefully, so the focus here is on what reverse-mode autodiff is doing in practice: computing parameter gradients for a scalar loss and updating the weights.
See graph-level auto-diff explanation and demo for the explicit node-by-node picture.