ANE op catalog: every native MIL op x device (M1-M5)
Generated from aneforge/_op_catalog.py in the ANEForge repository (python docs/gen_op_catalog.py); do not hand-edit. Query the same data at runtime via af.op_info, af.is_native(op, chip), af.ops_on(chip), af.min_native_family(op), af.walled_everywhere().
187 native MIL ops. Device ladder: m1=A13, m2=A14, m3=A15, m4_m5=A16/A17. Cells: Y native, ~ bridge/decompose, N walled. aneforge's higher-level ops (rms_norm/group_norm/mha/sdpa/fft/linalg/...) are composites that lower to these.
Activations (incl. LUT)
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
ceil |
Y |
Y |
Y |
Y |
ElementWise |
F2 LUT (probed native M1) |
clamped_relu |
Y |
Y |
Y |
Y |
ClampedRelu |
LUT |
clip |
Y |
Y |
Y |
Y |
ElementWise |
user-facing clamp; LUT |
elu |
Y |
Y |
Y |
Y |
Elu |
LUT (effectively F2 -> A13+) |
erf |
Y |
Y |
Y |
Y |
SimpleActivation |
F2 LUT |
exp |
Y |
Y |
Y |
Y |
ElementWise |
LUT |
exp2 |
Y |
Y |
Y |
Y |
ElementWise |
F2 LUT |
floor |
Y |
Y |
Y |
Y |
ElementWise |
F2 LUT (probed native M1) |
gelu |
Y |
Y |
Y |
Y |
Gelu |
LUT (M1 probe: ~0.08 rel err vs exact - LUT approximation, still native) |
leaky_relu |
Y |
Y |
Y |
Y |
LeakyRelu |
LUT |
log |
Y |
Y |
Y |
Y |
ElementWise |
LUT (ln2 immediate) |
prelu |
Y |
Y |
Y |
Y |
PRelu |
per-channel alpha (LUT); native at rank >=3 (M1-confirmed) |
relu |
Y |
Y |
Y |
Y |
SimpleActivation |
F0 SimpleActivation |
relu6 |
Y |
Y |
Y |
Y |
|
LUT |
round |
Y |
Y |
Y |
Y |
ElementWise |
F2 LUT round-nearest (probed native M1) |
scaled_tanh |
Y |
Y |
Y |
Y |
ScaledTanh |
LUT |
sigmoid |
Y |
Y |
Y |
Y |
SimpleActivation |
F0/LUT (incl. hard variant) |
sigmoid_hard |
Y |
Y |
Y |
Y |
SigmoidHard |
LUT |
sign |
Y |
Y |
Y |
Y |
ElementWise |
F2 LUT (probed native M1) |
silu |
Y |
Y |
Y |
Y |
SimpleActivation |
a.k.a. swish; LUT |
softmax |
Y |
Y |
Y |
Y |
Softmax |
F2 LUT (log2e immediate) |
softplus |
Y |
Y |
Y |
Y |
Softplus |
LUT (+ parametric) |
softplus_parametric |
Y |
Y |
Y |
Y |
Softplus |
LUT |
softsign |
Y |
Y |
Y |
Y |
Softsign |
LUT |
tanh |
Y |
Y |
Y |
Y |
ElementWise |
LUT |
threshold |
Y |
Y |
Y |
Y |
ElementWise |
LUT |
thresholded_relu |
Y |
Y |
Y |
Y |
ThresholdedRelu |
LUT |
Comparison / logical
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
equal |
Y |
Y |
Y |
Y |
ElementWise |
F0 compare -> bool (probed native M1) |
greater |
Y |
Y |
Y |
Y |
ElementWise |
F0 compare |
greater_equal |
Y |
Y |
Y |
Y |
ElementWise |
F0 compare (probed native M1) |
less |
Y |
Y |
Y |
Y |
ElementWise |
F0 compare (probed native M1) |
less_equal |
Y |
Y |
Y |
Y |
ElementWise |
F0 compare (probed native M1) |
logical_and |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - decompose via min/mul on host |
logical_not |
Y |
Y |
Y |
Y |
ElementWise |
F0 (probed native M1) |
logical_or |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - decompose via max on host |
logical_xor |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - decompose via != on host |
not_equal |
Y |
Y |
Y |
Y |
ElementWise |
F0 compare (probed native M1) |
select |
Y |
Y |
Y |
Y |
Select |
user-facing where; template_text backend |
Control flow
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
call |
~ |
~ |
~ |
~ |
Call |
function call; mapped_no_current_hwx_case (inlined) |
cond |
~ |
~ |
~ |
~ |
|
mapped_no_current_hwx_case + Unsupported converter - no standalone ANE codegen; flatten on host |
while_loop |
~ |
~ |
~ |
~ |
|
mapped_no_current_hwx_case + Unsupported/WhileLoop - unroll on host |
Conv / MatMul / Pooling
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
avg_pool |
Y |
Y |
Y |
Y |
Pool |
F0; window <=29 (M1) / 31 (A14+); 3D window A13+ |
conv |
Y |
Y |
Y |
Y |
Conv |
F0; M1 kernels <=29x29 (13x13 fp16), 3D depth native A13+; M5 <=32x32 |
conv_transpose |
Y |
Y |
Y |
Y |
Conv |
F0 deconv; strided axes use small-kernel caps |
einsum |
Y |
Y |
Y |
Y |
Einsum |
lowers to matmul/transpose chain |
l2_pool |
Y |
Y |
Y |
Y |
Pool |
special LUT pool (1024-entry fp16) |
linear |
Y |
Y |
Y |
Y |
Linear |
folds to conv when RHS <=2 MB SRAM working set |
linear_activation |
Y |
Y |
Y |
Y |
LinearActivation |
fused linear+activation |
matmul |
Y |
Y |
Y |
Y |
Matmul |
NE lane / conv-fold; same tensor caps as conv |
max_pool |
Y |
Y |
Y |
Y |
Pool |
F0 |
ne_bypass |
~ |
~ |
~ |
~ |
NEBypass |
private NEBypass unit; mapped_no_current_hwx_case |
ne_conv |
Y |
Y |
Y |
Y |
NEConv |
private NEConv unit (fill=0x44/mir=0x5d) |
ne_matmul |
Y |
Y |
Y |
Y |
NEMatMul |
private NEMatMul unit |
ne_pool |
Y |
Y |
Y |
Y |
NEPool |
private NEPool unit (probe-pending codegen, treated reachable) |
pe_elementwise |
Y |
Y |
Y |
Y |
PEElementWise |
private PEElementWise unit (fill=0x49/mir=0x59) |
pe_goc |
~ |
~ |
~ |
~ |
PEGOC |
private PEGOC unit; mapped_no_current_hwx_case (compiler-internal) |
pe_pool |
Y |
Y |
Y |
Y |
PEPool |
private PEPool unit |
scaled_dot_product_attention |
Y |
Y |
Y |
Y |
SDPA |
F2; rides matmul+softmax (NOT texture-gated) - native on M1. user-facing sdpa |
Detection / sampling
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
argsort |
N |
Y |
Y |
Y |
Sort |
Sort family, A14+; codegen-rejected on M1 (= sort floor) |
list_gather |
N |
N |
N |
N |
Unsupported |
TensorList op - Unsupported everywhere |
list_length |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
list_read |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
list_scatter |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
list_write |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
make_list |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
non_maximum_suppression |
Y |
Y |
Y |
Y |
NonMaximumSuppression |
template_text NMS backend |
random_bernoulli |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - host RNG |
random_categorical |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - host RNG |
random_normal |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - host RNG |
random_uniform |
~ |
~ |
Y |
Y |
RandomUniform |
RNG, A15+ (HAL 0x4a9=0 on M1/M2); aneforge uses host RNG below A15 (dropout/random decomposable) |
topk |
N |
Y |
Y |
Y |
TopK |
rank/sort bridge, A14+ (_OP_FLOOR); bridge validator callable on M1 but codegen-rejected (measured) |
Elementwise arithmetic
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
abs |
Y |
Y |
Y |
Y |
ElementWise |
PEElementWise (F0) |
add |
Y |
Y |
Y |
Y |
ElementWise |
const + tensor forms; text-immediate fused const |
cumsum |
Y |
Y |
Y |
Y |
CumSum |
runs ON the ANE as a single op (verified M1 2026-06-09: cos 1.0) - NOT host-decomposed. The standard MIL cumsum op is unimplemented, so it is reached via the curated e5rt path (see _capabilities). |
floor_div |
Y |
Y |
Y |
Y |
ElementWise |
LUT-assisted (actlut:2) |
inverse |
Y |
Y |
Y |
Y |
ElementWise |
reciprocal LUT |
maximum |
Y |
Y |
Y |
Y |
ElementWise |
const + tensor (LUT) |
minimum |
Y |
Y |
Y |
Y |
ElementWise |
const + tensor (LUT) |
mod |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - decompose on host |
mul |
Y |
Y |
Y |
Y |
ElementWise |
const + tensor forms |
pow |
Y |
Y |
Y |
Y |
ElementWise |
pow_const; user-facing x ** y (probed native M1) |
real_div |
Y |
Y |
Y |
Y |
ElementWise |
general divide; A11/A12 = const-fp16 reciprocal only. user-facing truediv/div |
rsqrt |
Y |
Y |
Y |
Y |
ElementWise |
F2 LUT |
sqrt |
Y |
Y |
Y |
Y |
ElementWise |
F2 LUT activation (native A13+, decomposed on A11/A12) |
square |
Y |
Y |
Y |
Y |
ElementWise |
F0 PEElementWise |
sub |
Y |
Y |
Y |
Y |
ElementWise |
lowered to add-of-negated-const |
Image / resize / texture
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
affine |
N |
Y |
Y |
Y |
Affine |
texture-engine only (A14+); "affine transform is not supported on this architecture" on M1 |
crop_resize |
N |
Y |
Y |
Y |
CropResize |
texture-engine only (A14+, HAL 0x81d) - _OP_FLOOR; unavailable on M1, no host substitution wired |
degamma |
~ |
~ |
~ |
~ |
DeGamma |
ISP/image op; mapped_no_current_hwx_case |
gamma |
~ |
~ |
~ |
~ |
Gamma |
ISP/image op; mapped_no_current_hwx_case |
pixel_buffer_to_tensor |
~ |
~ |
~ |
~ |
PixelBufferToTensor |
4CC image input; mapped_no_current_hwx_case. Does not lower on the unentitled direct path (entitlement gate, not chip gate); use af.image_input. |
resample |
N |
Y |
Y |
Y |
Resample |
texture-engine only (A14+); warp depth=1, channel in {1,2}. Walled on M1 |
resize |
~ |
Y |
Y |
Y |
Resize |
F2 but texture-gated: M1 = software deconv/transpose fallback (different rounding; some modes hard-abort); native A14+ |
resize_bilinear |
~ |
Y |
Y |
Y |
ResizeBilinear |
NE lane; sw-fallback on M1 |
resize_nearest_neighbor |
~ |
Y |
Y |
Y |
ResizeNearestNeighbor |
NE lane; sw-fallback on M1 (1x1-source fast path exists) |
tensor_to_pixel_buffer |
~ |
~ |
~ |
~ |
TensorToPixelBuffer |
mapped_no_current_hwx_case (compiler-internal) |
upsample_bilinear |
~ |
Y |
Y |
Y |
UpsampleBilinear |
NE lane; sw-fallback on M1 |
upsample_nearest_neighbor |
~ |
Y |
Y |
Y |
UpsampleNearestNeighbor |
NE lane; sw-fallback on M1 |
Normalization
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
batch_norm |
Y |
Y |
Y |
Y |
BatchNorm |
inference fold-to-affine runs everywhere (incl. A11/A12); native stats form is A13+ |
instance_norm |
Y |
Y |
Y |
Y |
InstanceNorm |
F2 |
l2_norm |
Y |
Y |
Y |
Y |
|
F2 |
layer_norm |
Y |
Y |
Y |
Y |
LayerNorm |
F2 (native A13+) |
local_response_norm |
Y |
Y |
Y |
Y |
LRNorm |
LRN bridge (measured Y on M1) |
Quantization / dtype
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
cast |
Y |
Y |
Y |
Y |
Cast |
F0 format primitive. fp16<->fp32/bool native on M1; cast(->int32) is walled on M1 (empirically confirmed) - keep dtype fp on h13 |
const |
~ |
~ |
~ |
~ |
ConstOps |
mapped_no_current_hwx_case - folded at compile, not a standalone codegen op |
constexpr_affine_dequantize |
~ |
~ |
~ |
~ |
ConstOps |
weight-compression const; folded. int4-LUT streams natively from M1; int8/affine fold to fp16 below A15 (HAL +0x520-0x539). |
constexpr_blockwise_shift_scale |
~ |
~ |
Y |
Y |
ConstOps |
blockwise stream gate A15+; folds to fp16 on M1/M2 |
constexpr_cast |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
constexpr_lut_to_dense |
Y |
Y |
Y |
Y |
ConstOps |
palette/LUT stream gate (+0x529) is A13-on -> int4-LUT streams natively from M1 (*the one compressed format that wins on M1) |
constexpr_lut_to_sparse |
~ |
~ |
~ |
~ |
ConstOps |
folded const; sparse stream A15+ |
constexpr_sparse_blockwise_shift_scale |
~ |
~ |
Y |
Y |
ConstOps |
sparse+blockwise stream A15+ |
constexpr_sparse_to_dense |
~ |
~ |
Y |
Y |
ConstOps |
sparse stream A15+ |
dequantize |
Y |
Y |
Y |
Y |
Dequantize |
F0 |
quantize |
Y |
Y |
Y |
Y |
Quantize |
F0 (not texture-gated) |
Recurrent
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
gru |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - unroll to conv/matmul+activation on host |
lstm |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - unroll on host |
rnn |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - unroll on host |
Reductions
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
reduce_argmax |
Y |
Y |
Y |
Y |
ReduceArg |
per-axis ArgMax - F0, all chips (bridge ArgMax measured Y on M1) |
reduce_argmin |
~ |
~ |
Y |
Y |
ReduceArg |
per-axis argmin; M1/M2 walled on the MIL route (HAL 0x4f2, A15+), bridge mirrors argmax. user-facing argmin |
reduce_l1_norm |
Y |
Y |
Y |
Y |
Reduce |
F2 Reduce |
reduce_l2_norm |
Y |
Y |
Y |
Y |
Reduce |
F2 Reduce |
reduce_log_sum |
Y |
Y |
Y |
Y |
Reduce |
LUT-assisted Reduce (ln2 immediate) |
reduce_log_sum_exp |
Y |
Y |
Y |
Y |
Reduce |
LUT Reduce; aneforge wires its vjp (probed native M1) |
reduce_max |
Y |
Y |
Y |
Y |
Reduce |
F2 |
reduce_mean |
Y |
Y |
Y |
Y |
Reduce |
F2 |
reduce_min |
Y |
Y |
Y |
Y |
Reduce |
F2 |
reduce_prod |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - decompose (log-sum-exp / scan) on host |
reduce_sum |
Y |
Y |
Y |
Y |
Reduce |
F2 (native A13+; decomposed on A11/A12). reduced-axis >=192 -> transpose route (>=384 on A15+) |
reduce_sum_square |
Y |
Y |
Y |
Y |
Reduce |
F2; the 0x494 reduce->square fusion is M2+ only - M1 emits an extra fp16 round (<=1-round numeric, not a wall) |
Special / math
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
acos |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
acosh |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
asin |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - host decomposition |
asinh |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
atan |
Y |
Y |
Y |
Y |
ElementWise |
F2 LUT - native on M1 (probe: WORKS; the one trig in vocab on h13) |
atanh |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
cos |
~ |
~ |
Y |
Y |
ElementWise |
F4 trig, native A15+ only (REJECTED on M1/A14); M1/M2 Horner |
cosh |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere (REJECTED M1 probe) |
cost_volume |
~ |
~ |
~ |
~ |
CostVolume |
bridge CostVolume (measured Y on M1); mapped_no_current_hwx_case |
cross_product |
~ |
~ |
~ |
~ |
CrossProduct |
bridge CrossProduct (measured Y on M1) but mapped_no_current_hwx_case in MIL map - reachable via bridge |
matrix_decomposition |
~ |
~ |
~ |
~ |
MatrixDecomposition |
mapped_no_current_hwx_case - no observed codegen |
sin |
~ |
~ |
Y |
Y |
ElementWise |
F4 trig, native A15+ only (REJECTED on M1/A14 - silicon-measured); M1/M2 use special.py Horner |
sinh |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere (REJECTED M1 probe) - (exp(x)-exp(-x))/2 on host |
tan |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere (REJECTED M1 probe) - sin/cos Horner identity on host |
Stateful (state / buffers)
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
circular_buffer_to_tensor |
~ |
~ |
~ |
~ |
CircularBufferToTensor |
mapped_no_current_hwx_case; ring-buffer reader |
read_state |
Y |
Y |
Y |
Y |
ReadState |
F2 stateful; reachable on M1 but needs the e5rt inout-tensor-desc plumbing for KV-cache. |
tensor_buffer_to_tensor |
~ |
~ |
~ |
~ |
TensorBufferToTensor |
mapped_no_current_hwx_case; F2 ring/streaming buffer mover (A13+, reachable inside stateful graph) |
tensor_to_circular_buffer |
~ |
~ |
~ |
~ |
TensorToCircularBuffer |
mapped_no_current_hwx_case; ring-buffer writer |
tensor_to_tensor_buffer |
~ |
~ |
~ |
~ |
TensorToTensorBuffer |
mapped_no_current_hwx_case |
write_state |
Y |
Y |
Y |
Y |
WriteState |
F2 stateful |
Structural / shape
| op |
M1 |
M2 |
M3 |
M4/M5 |
kernel |
note |
band_part |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere (mask via host) |
batch_to_space |
Y |
Y |
Y |
Y |
BatchToSpace |
inverse of above |
concat |
Y |
Y |
Y |
Y |
Concat |
F0 DMA |
crop |
Y |
Y |
Y |
Y |
Crop |
F0 slice/crop (distinct from texture crop_resize) |
depth_to_space |
Y |
Y |
Y |
Y |
DepthToSpace |
F2 NE lane; user-facing pixel_shuffle |
expand_dims |
Y |
Y |
Y |
Y |
ExpandDims |
F0 |
fill |
Y |
Y |
Y |
Y |
Fill |
const tensor producer |
fill_like |
Y |
Y |
Y |
Y |
FillLike |
const tensor producer |
flatten2d |
Y |
Y |
Y |
Y |
|
F0 |
gather |
Y |
Y |
Y |
Y |
Gather |
software gather on M1 in narrow envelope (batch=1,depth=1); hw gather_hw path is A14+ (_OP_FLOOR) |
gather_along_axis |
Y |
Y |
Y |
Y |
GatherAlongAxis |
template_text; same M1 envelope caveat |
gather_nd |
~ |
Y |
Y |
Y |
GatherND |
M1 = sw-envelope only (IsValidForH13: batch=1,depth=1,idx-ch=3); outside it rejected. Native (texture) A14+ |
identity |
Y |
Y |
Y |
Y |
Cast |
aliases Cast/no-op |
non_zero |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere (data-dependent shape) |
one_hot |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - decompose (eye-gather) on host |
pad |
Y |
Y |
Y |
Y |
Pad |
const pad F0 (NE lane). symmetric/reflect pad is texture-gated -> ~/sw on M1, native A14+ |
pixel_shuffle |
Y |
Y |
Y |
Y |
PixelShuffle |
F2 NE lane |
pixel_unshuffle |
Y |
Y |
Y |
Y |
PixelUnshuffle |
template_text |
range_1d |
~ |
~ |
~ |
~ |
|
template_text; M1 raw-MIL probe: walled (positional-encoding range rejects on h13 codegen) - host-precompute the const |
reshape |
Y |
Y |
Y |
Y |
Reshape |
F0 metadata (A11/A12 = fp16-only/Flatten-or-abort) |
reshape_like |
Y |
Y |
Y |
Y |
ReshapeLike |
F0 |
reverse |
Y |
Y |
Y |
Y |
Reverse |
NE lane (probed native M1; aneforge wires vjp) |
reverse_sequence |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - decompose on host |
scatter |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere - decompose on host |
scatter_along_axis |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
scatter_nd |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere |
shape |
N |
N |
N |
N |
Unsupported |
Unsupported everywhere (static-shape graphs only) |
slice_by_index |
~ |
~ |
~ |
~ |
SliceByIndex |
mapped_no_current_hwx_case; static-offset slice folds into descriptor (reachable inside graph) |
slice_by_size |
Y |
Y |
Y |
Y |
SliceBySize |
F0; pre-A16 width-offset quirk (Q.4 x16 crop-DMA): CONCATenating multiple nonzero last-axis (width) slices returns WRONG ELEMENTS on A14 (the gather-axis-1 bug; a SINGLE width slice is exact on A14 - linalg column/element extraction is green on M2); on A13 a width slice also saturates |
slice_update |
Y |
Y |
Y |
Y |
SliceUpdate |
template_text backend |
sliding_windows |
N |
N |
N |
N |
NotImplemented |
NotImplemented on any backend - decompose on host |
space_to_batch |
Y |
Y |
Y |
Y |
SpaceToBatch |
factor in {2,3,4,8}; batch cap 4096 (older)/65536 |
space_to_depth |
Y |
Y |
Y |
Y |
SpaceToDepth |
F2 NE lane; user-facing pixel_unshuffle |
split |
Y |
Y |
Y |
Y |
Split |
F0 |
squeeze |
Y |
Y |
Y |
Y |
Squeeze |
F0 |
stack |
Y |
Y |
Y |
Y |
Stack |
F0 |
tile |
Y |
Y |
Y |
Y |
Tile |
F2 (A13+); factors of {2,3,4,8}. Absent on A11/A12 |
transpose |
Y |
Y |
Y |
Y |
Transpose |
F0 but capped by max-transpose-extent (16384 M1-A15 -> 65536 M5; 0 on A11/A12) |