Well, after building and fine tuning (to get it to work with the charger's charging cycle), I have the following circuit that seems to work pretty good for a BMS on the XM-3500. A few notes;
1) Before adding this, I would do a full charge and see most cells around 3.4-3.5V, but there were 2 that stayed at 3.31 and then a couple at 3.95 and 4.01V. If you rode for just a short time, they all went to about the same voltage.
2) Before adding the BMS, my top speed was about 48-49 MPH and would go 25-30 MPH up the steep hill in front of my house (17-20% slope).
3) Fine tuning the set point of the BMS turning on to coincide with the peak output voltage of the charger before it would go into it's pulsed trickle state ended up being based on 72.6V total and 3.63V nominal per cell.
4) Once I had this circuit built, I put it into the bike and ran a charge state and got much better full charge balance (but the two low cells from before were still low by 0.1V from the rest athough they were slowly coming in - I did a quick dumb DC current charge into these two to speed them up since I wanted to see if the BMS worked at maintaining balance).
5) I have used the bike a couple of times now and get a good balance and full charge without having to do anything. I measured all the cells after charging and the worse case ones were 3.602V and 3.643V, so about +/-0.20V from average. The final string voltage was 72.1V.
6) I reran some performance tests and I am now getting 30-32 MPH up my street and have achieved 53 MPH top end (flat ground, no wind, no "drafting", etc)
Thus, it seems to work. If you have a couple of cells really low from the rest, it would eventually get them in, but you might consider "feeding" them individually first with a simple DC supply and series resistor that limits the current to 1-2A and then just watch the cell until it hits 3.7V. Then put the BMS on and be done.
I am not claiming this is perfect and is only designed for the XM-3500 20-cell string and supplied charger, so take this info at your own risk. But I am getting almost published speeds and range now and feel more confident in the cells being happier.
Below is schematic per cell, a Digikey parts list, and picture of it installed. The last picture shows it is balanced at the end of the full-current charge (it is shunting part of the full charger so the cell string still climbs to the shut-off point of the charger which is about 2 minutes when all LEDS are lit). It shunts the trickle current quite well and keeps cells balanced.
A32508-ND CONN D-SUB RCPT 25POS AU FLASH
A32505-ND CONN D-SUB PLUG 25POS GOLD FLASH
V1034-ND PC BOARD PAD-PER-HOLE 4.5X8.08
MJE210GOS-ND TRANS PWR PNP 5A 25V TO225AA
TL431ACLPRAGOSCT-ND IC REFERENCE PROG 2.5V TO92
1.00KXBK-ND RES 1.00K OHM 1/4W 1% METAL FILM
2.21KXBK-ND RES 2.21K OHM 1/4W 1% METAL FILM
7.5QBK-ND RES 7.5 OHM 1/4W 5% CARBON FILM
150QBK-ND RES 150 OHM 1/4W 5% CARBON FILM
CPRB-2.0-ND RES 2.0 OHM 5% 10W WIREWOUND AXL
HS121-ND HEATSINK TO-220 5W BLK