About 4 results
Open links in new tab

[Bug] fla's swiglu_linear got error in Transformers Engine with ...
Sep 12, 2025 · stage3_max_reuse_distance: 1.0e+09 stage3_gather_16bit_weights_on_model_save: true gradient_accumulation_steps: 4 gradient_clipping: "auto" steps_per_print: "auto" …
lint · Workflow runs · fla-org/flash-linear-attention · GitHub
Mar 24, 2025 · 126 workflow run results 126 workflow run results [Misc.] Reuse masks in l2norm kernels lint #128: Commit 7e385e1 pushed by yzhangcs March 22, 2025 20:30 22s main main March 22, …
Workflow runs · fla-org/flash-linear-attention · GitHub
March 22, 2025 20:56 15s March 22, 2025 20:56 15s [Misc.] Reuse masks in l2norm kernels lint #128: Commit 7e385e1 pushed by yzhangcs March 22, 2025 20:30 22s main main March 22, 2025 20:30 …