This paper presents a novel data detector ASIC for massive multiuser multiple-input multiple-output (MU-MIMO) wireless systems. The ASIC implements a modified version of the large-MIMO approximate message passing algorithm (LAMA), which achieves near-optimal error-rate performance (i) under realistic channel conditions and (ii) for systems with as many users as base-station (BS) antennas. The hardware architecture supports 32 users transmitting 256-QAM simultaneously and in the same time-frequency resource, and provides soft-input soft-output capabilities for iterative detection and decoding. The fabricated 28nm CMOS ASIC occupies 0.37mm$^2$ , achieves a throughput of 354Mb/s, consumes 151mW, and improves the SNR by more than 11dB compared to existing data detectors in systems with 32 BS antennas and 32 users for realistic channels. In addition, the ASIC achieves 4x higher throughput per area compared to a recently proposed message-passing detector.