Pretrained models¶
Loaders that import pretrained weights and fuse the model into one ANE program.
models ¶
Pretrained-model loaders: load (BERT-family sentence encoders) and
load_resnet18 (torchvision ImageNet classifier). Each builds an aneforge graph
from the real weights and compiles it to a fused ANE program. Heavy deps
(transformers / torchvision) are imported lazily so the core stays light.
Also ships the trainable-graph builders used with the on-ANE autograd:
group_norm_train (any-batch GroupNorm with trainable affine), conv_block
(conv -> GroupNorm -> ReLU -> optional max-pool), and cifar_cnn (a full
CIFAR-10 CNN returning the input, logits, and trainable parameter list).
Vision ¶
Source code in aneforge/models.py
Encoder ¶
Source code in aneforge/models.py
load ¶
Load a BERT-family sentence encoder from HF weights as an ANE embedder.
embed = af.load("sentence-transformers/all-MiniLM-L6-v2")
vecs = embed(["hello world", "the cat sat"]) # [2, D], L2-normalised
Tokenisation + embedding lookup run on the host (gather is not an ANE op); the transformer layers run on the ANE as fused programs (cached per sequence length); mean-pooling + normalise run on the host.
Source code in aneforge/models.py
load_resnet18 ¶
load_resnet18(int8: bool = False, compress: str | None = None, compress_atol: float = 0.05, build_dir: str | None = None) -> 'Vision'
Load torchvision ResNet-18 (ImageNet) as a fused ANE classifier.
clf = af.load_resnet18()
logits = clf(image) # [1,3,224,224] -> [1,1000]
clf = af.load_resnet18(compress="int4") # 4-bit LUT weights
BatchNorm is folded into the preceding conv at load, so the ANE graph is pure
conv/relu/pool/add/fc. Conv is the ANE's strongest workload. compress picks
the weight encoding (see af.compile); build_dir keeps the packed program
on disk (its weights.bin is the packed-model size).
Source code in aneforge/models.py
group_norm_train ¶
GroupNorm built from primitives so it works at ANY batch N (the stock
Tensor.group_norm op is batch-1 only) and so the affine gamma/beta are real
trainable parameters. x is [N, C, H, W]; gamma/beta are [1, C, 1, 1]
parameter Tensors. Normalizes per-(group, sample) over the C/groupsHW elements,
then applies the affine. Every op here (reshape, mean, square, rsqrt, adds, mul,
add) has a VJP, so input/gamma/beta gradients all run on the ANE. Mirrors the
group_norm VJP math (aneforge/autograd.py:425).
Source code in aneforge/models.py
conv_block ¶
conv2d(pad=1) -> GroupNorm(train) -> ReLU -> optional max_pool(pool).
conv_w is a conv_param; gamma/beta are [1,Cout,1,1] params; pool=0
means no pooling. Returns the block output Tensor.
Source code in aneforge/models.py
cifar_cnn ¶
Build the CIFAR-10 CNN graph. Returns (x_input, logits, params) where params is the trainable list in a fixed order. Architecture (per the design spec): block1 conv 3->w0 GN ReLU maxpool2 (32x32 -> 16x16) block2 conv w0->w1 GN ReLU maxpool2 (16x16 -> 8x8) block3 conv w1->w2 GN ReLU ( 8x8) global-avg-pool over H,W -> fc(w2->classes)