Reproducibility
Press the OPEN button for more details such as links to the Weights & Biases runs of each experiment. Three runs for each experiment.
Legend
- ❓: no run for the particular experiment.
- ✅: The experiment has been reproduced.
- 🤨🤨🤨: The reproducibility runs show greater performance than the reported numbers.
- ⛔️: The reproducibility failed.