As you can see, both chains spend most time with at least one cluster each around the "correct" values, but occasionally they go wrong.
This code, while it works for the example model, is not yet ready to be checked in to the main branch of Mamba. There were several cleanup steps for which I did not ultimately have time.
I have updated the "slice" and "slicesimplex" samplers to work with the new data structures. However, the other samplers which I did not use in my work are currently broken in the gsocMNVP branch; they still try to use the old data structures. Updating them, along the same lines as the slice and slicesimplex samplers, would be a more-or-less routine task - an hour or so of work per sampler.
The diagnostics and plots, aside from the traceplot shown above, are also not updated. Fixing this is a less trivial task, as, due to the "labelling problem", most diagnostics need to be rethought in some way in order to apply to dirichlet models.
Once that cleanup is done — a few days' work - and the merge is complete, implementing the full Crosscat model as in the original plan should not be too difficult. Optimistically, I feel it would take 1-2 weeks... which means that realistically, probably 4-6 is more realistic. In any case, the new data structures I've implemented would make this job primarily a matter of just implementing the statistical algorithms; the data and model infrastructure is all well in place.
With a combination of NUTS and discrete capabilities, I believe that Mamba will begin to actually be superior to Stan for some tasks. It has a long way to go to catch up to Stan's maturity, but in solving the "two language problem", it gives a strong incentive for me and others to continue on this work.
I want to thank my GSoC mentor Benjamin Deonovic for his help and understanding in what has been a difficult but fun project.