Netscaler Upgrade (11.1.64.14 to 65.12) Auth Hang after Duo Auth (with a "success logging you in")

Netscaler Upgrade (11.1.64.14 to 65.12) Auth Hang after Duo Auth (with a “success logging you in”)

Any known issues or suggestions? No other changes than version upgrade (security update).

Additional Info: Seems Radius Related. I may be getting a “User Enroll” in Duo Proxy Log (need to match timestamp.

Here is the NS Auth Log:

/usr/home/build/adc/usr.src/netscaler/aaad/radius_drv.c[2100]: process_radius radius accepts :
Tue Sep 22 05:29:23 2020
/usr/home/build/adc/usr.src/netscaler/aaad/radius_drv.c[2103]: process_radius extracted group string :(null)
Tue Sep 22 05:29:23 2020
/usr/home/build/adc/usr.src/netscaler/aaad/naaad.c[2569]: send_accept sending accept to kernel for :
Tue Sep 22 05:29:36 2020
/usr/home/build/adc/usr.src/netscaler/aaad/naaad.c[575]: main timer 1 firing…
Tue Sep 22 05:29:43 2020
/usr/home/build/adc/usr.src/netscaler/aaad/radius_drv.c[1518]: radius_challenge_dialogue_response ignore radius error below, canceling dialogue mode
Tue Sep 22 05:29:43 2020
/usr/home/build/adc/usr.src/netscaler/aaad/radius_drv.c[1521]: radius_challenge_dialogue_response
Tue Sep 22 05:29:43 2020
/usr/home/build/adc/usr.src/netscaler/aaad/naaad.c[2882]: send_reject_with_code Avoiding response to kernel as the socket has already been closed

We ran into the same issue in early August after upgrading from 11.1.63.15 to 11.1.65.12, which caused several significant interruptions to services. Long story short - either roll back the upgrade (can just use the GUI or CLI and install the older version the same way as you would during an upgrade) or go all-in and upgrade to the 12.1x code.

One thing Citrix specifically called out as ‘fixed’ in the release notes for 65.12 that we (and Citrix support) believe to what actually breaks the integration:
In rare cases, the Citrix Gateway appliance might fail when users are challenged for a one-time code. [ NSHELP-20967 ]

Our short-term solution was to roll back the update and return to 11.1.63.15, which resolved the issue and bought us time to troubleshoot further.

Once we verified that downgrading fixed things, we upgraded one of our Netscalers to the latest 12.1x code, and verified it also did not have the same login bug as 65.12. We then upgraded the rest of our NSGWs to 12.1x, and all is well.

Confirmed. Known (but not documented) radius issue with NS 11.1.65.12 . My resolution was to upgrade to the latest version: 12.1.x .

I thought for sure it would break my custom themes, storefront password change, duo, etc… but it worked smoothly.