Indeed, that's what I kind of hinted at in https://news.ycombinator.com/item?id=46442195 and coincidentally https://news.ycombinator.com/item?id=46437688 briefly after, namely that OK, one can "generate" a "solution", that's much easier than before... but until we can verify somehow that it actually does what it say it does (and we know of hallucinations and have no reason to believe this changed) then testing itself, especially of well know "problems" is more and more important.
That being said, it doesn't answer the "why" in the first place, an even more important question. At least though it does help somehow to compare with existing alternatives.
Heck, when Satya Nadella wanted to demonstrate Copilot coding, he had it emit an Altair emulator. I guess there's little room for creativity in 8-bit emulator design so LLMs can handle them well. https://thenewstack.io/from-basic-to-vibes-microsofts-50-yea...
WASM and the performance seems catastrophically bad (45ms to render a frame on an M4 laptop)? It would be much more impressive if Claude could optimize it into something that someone would actually want to play? Compare this to a random hit from Google, https://jsnes.org/ which has sound, much smaller payload, and runs really fast (<1ms to render a frame).
The cost of slop is >40X drop in performance? Pick any metric that you care about for your domain perhaps that's what you're going to lose and is the effort to recover that practical with current vibe-coding strategies?
It’s a shame that the source code isn’t commented and documented more. At the very least, I would see it being helpful to add some documentation for every CPU op code being emulated.
Forbidding LLM to write comments and docstrings (preferrably enforced by build and commit hook) is one of the best "hacks" for using that thing. LLM cannot help itself but emit poisonous comments.
Give it copy paste / translate tasks and it’s a no brainer (quite literally)
But same can be said of humans.
The question here is, did it implement it because it read the available online documentation about the NES architecture OR did it just see one too many of such implementations.
That being said, it doesn't answer the "why" in the first place, an even more important question. At least though it does help somehow to compare with existing alternatives.
This endeavor had negative net value.
The cost of slop is >40X drop in performance? Pick any metric that you care about for your domain perhaps that's what you're going to lose and is the effort to recover that practical with current vibe-coding strategies?
Give it copy paste / translate tasks and it’s a no brainer (quite literally)
But same can be said of humans.
The question here is, did it implement it because it read the available online documentation about the NES architecture OR did it just see one too many of such implementations.