The difficulty of validating large-scale quantum devices, such as boson samplers, poses a major challenge for any research program that aims to show quantum advantages over classical hardware. Towards this aim, we propose a novel data-driven approach, wherein models are trained to identify common pathologies using unsupervised machine-learning methods. We illustrate this idea by training a classifier that exploits K-means clustering to distinguish between boson samplers that use indistinguishable photons from those that do not. We tune the model on numerical simulations of small-scale boson samplers and then validate the pattern-recognition technique on larger numerical simulations as well as on photonic chips in both traditional boson-sampling and scatter-shot experiments. The effectiveness of such a method relies on particle-type-dependent internal correlations present in the output distributions. This approach performs substantially better on the test data than previous methods and underscores the ability to further generalize its operation beyond the scope of the examples that it was trained on.