About 4 results
Open links in new tab
  1. [Bug] fla's swiglu_linear got error in Transformers Engine with ...

    Sep 12, 2025 · stage3_max_reuse_distance: 1.0e+09 stage3_gather_16bit_weights_on_model_save: true gradient_accumulation_steps: 4 gradient_clipping: "auto" steps_per_print: "auto" …

  2. lint · Workflow runs · fla-org/flash-linear-attention · GitHub

    Mar 24, 2025 · 126 workflow run results 126 workflow run results [Misc.] Reuse masks in l2norm kernels lint #128: Commit 7e385e1 pushed by yzhangcs March 22, 2025 20:30 22s main main March 22, …

  3. Workflow runs · fla-org/flash-linear-attention · GitHub

    March 22, 2025 20:56 15s March 22, 2025 20:56 15s [Misc.] Reuse masks in l2norm kernels lint #128: Commit 7e385e1 pushed by yzhangcs March 22, 2025 20:30 22s main main March 22, 2025 20:30 …