NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models (arxiv.org)
solomatov 17 hours ago [-]
Does anyone know if there were any attempts to test Mamba on really large scale? To me this model looks as the most promising successor to the transformer architecture. Does anyone know why it might not be the case or what are other alternatives?
tangjurine 11 hours ago [-]
Tencent's 'Hunyuan-T1'–The First Mamba-Powered Ultra-Large Model: https://news.ycombinator.com/item?id=43447254
ed 18 hours ago [-]
Interesting direction for research but not a model you’d want to use today. The paper looks at a 3b model built on llama3.2-3b, modified for mamba, and they’re comparing to a distilled version of r1 with 1.5b params.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 13:16:42 GMT+0000 (Coordinated Universal Time) with Vercel.