cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
cancel
3854
Views
23
Helpful
26
Replies

Corp users are getting stuck in 8021X_REQD

Noovi
Level 1
Level 1

Hello Team,

we have Cisco 5520 WLC and we have upgraded WLC to 8.10.183.0 image version.

After this upgrade, few region started complaining issue in connecting and as per invistigation we found that user are stuck in 8021X_REQD state.

When we checked logs in ISE, we found error like 'suplicant stopped responding to ISE'

We already checked with CISCO TAC from wireless and ISE end but no any findings from them.

Anyone has similar issues at your end?

26 Replies 26

marce1000
VIP
VIP

 

 - Below you will find the output of your attached debug file when processed with https://cway.cisco.com/wireless-debug-analyzer/ ,
(I used the flag Show All ) I would look into things like disable fast roaming settings on the WLAN if applicable  , update the client Wifi (NIC) drivers if not using the latest , :

 

Mar 16 19:35:51.343 *Dot1x_NW_MsgTask_3 WLC/AP is sending EAP-Identity-Request to the client
Mar 16 19:35:51.382 *Dot1x_NW_MsgTask_3 Client sent EAP-Identity-Response to WLC/AP
Mar 16 19:35:51.382 *aaaQueueReader Radius request with ID 150 sent to 172.28.139.138.
Mar 16 19:35:51.384 *radiusTransportThread Radius request with ID 150 sent to 172.28.139.138.
Mar 16 19:35:51.427 *aaaQueueReader Radius request with ID 151 sent to 172.28.139.138.
Mar 16 19:35:51.434 *radiusTransportThread Radius request with ID 151 sent to 172.28.139.138.
Mar 16 19:35:51.502 *aaaQueueReader Radius request with ID 152 sent to 172.28.139.138.
Mar 16 19:35:51.503 *radiusTransportThread Radius request with ID 152 sent to 172.28.139.138.
Mar 16 19:35:51.563 *aaaQueueReader Radius request with ID 153 sent to 172.28.139.138.
Mar 16 19:35:51.564 *radiusTransportThread Radius request with ID 153 sent to 172.28.139.138.
Mar 16 19:35:51.623 *aaaQueueReader Radius request with ID 154 sent to 172.28.139.138.
Mar 16 19:35:51.624 *radiusTransportThread Radius request with ID 154 sent to 172.28.139.138.
Mar 16 19:35:51.686 *aaaQueueReader Radius request with ID 155 sent to 172.28.139.138.
Mar 16 19:35:51.688 *radiusTransportThread Radius request with ID 155 sent to 172.28.139.138.
Mar 16 19:35:51.731 *aaaQueueReader Radius request with ID 156 sent to 172.28.139.138.
Mar 16 19:35:51.733 *radiusTransportThread Radius request with ID 156 sent to 172.28.139.138.
Mar 16 19:35:51.864 *aaaQueueReader Radius request with ID 157 sent to 172.28.139.138.
Mar 16 19:35:51.865 *radiusTransportThread Radius request with ID 157 sent to 172.28.139.138.
Mar 16 19:35:51.910 *aaaQueueReader Radius request with ID 158 sent to 172.28.139.138.
Mar 16 19:35:51.912 *radiusTransportThread Radius request with ID 158 sent to 172.28.139.138.
Mar 16 19:35:52.041 *aaaQueueReader Radius request with ID 159 sent to 172.28.139.138.
Mar 16 19:35:52.042 *radiusTransportThread Radius request with ID 159 sent to 172.28.139.138.
Mar 16 19:35:52.102 *aaaQueueReader Radius request with ID 160 sent to 172.28.139.138.
Mar 16 19:35:52.106 *radiusTransportThread Radius request with ID 160 sent to 172.28.139.138.
Mar 16 19:35:52.153 *aaaQueueReader Radius request with ID 161 sent to 172.28.139.138.
Mar 16 19:35:52.163 *Dot1x_NW_MsgTask_3 RADIUS Server permitted access
Mar 16 19:35:52.163 *Dot1x_NW_MsgTask_3 Client will be required to Reauthenticate in 43000
seconds
Mar 16 19:35:52.163 *Dot1x_NW_MsgTask_3 4-Way PTK Handshake, Sending M1
Mar 16 19:35:52.217 *Dot1x_NW_MsgTask_3 4-Way PTK Handshake, Received M2
Mar 16 19:35:52.217 *Dot1x_NW_MsgTask_3 4-Way PTK Handshake, Sending M3
Mar 16 19:35:52.269 *Dot1x_NW_MsgTask_3 4-Way PTK Handshake, Received M4
Mar 16 19:35:52.269 *Dot1x_NW_MsgTask_3 Client has completed PSK Dot1x or WEP authentication phase
Mar 16 19:35:52.269 *Dot1x_NW_MsgTask_3 Client has entered DHCP Required state
Mar 16 19:35:54.552 *emWeb Client delete code: Multiple triggers
That can be due to possible reasons: Received a CCX RM request from a client with CCX version lower than 2/ Radius server sent a disconnect request (RFC3576, etc)/ On some scenarios of client blacklist (administrator request)/ For HTTP profiling scenarios, after a vlan change, so policies can be reapplied, or when received policies have a different session timeout, from the client session timeout/ WLAN is deleted or disabledIn PMIPv6, MAG notified to delete the client/ Administrator request a client delete by CLI/GUI
Mar 16 19:35:54.552 *emWeb Client expiration timer code set for 1 seconds. The reason: Dissasociation or deauthentication received from client, this is valid on 802.11w scenario. Also, generic termination clause, reason would be provided by pervious log message
Mar 16 19:35:55.398 *apfReceiveTask Client session has timed out
Mar 16 19:35:55.398 *apfReceiveTask Client disassociation event has occured. Possible reasons may be due to AP Radio Reset usually due to channel change or wlan was manually disabled or Client unable to get valid DHCP IP for WLAN using DHCP required
Mar 16 19:35:55.398 *apfReceiveTask Client has been deauthenticated
Mar 16 19:35:55.398 *apfReceiveTask Client session has timed out
Connection attempt #1
Mar 16 19:35:58.490 *apfMsConnTask_0 Client roamed to AP/BSSID BSSID 24:36:da:13:db:f6 AP CN-07928ap-04
Mar 16 19:35:58.490 *apfMsConnTask_0 The WLC/AP has found from client association request Information Element that claims PMKID Caching support
Mar 16 19:35:58.490 *apfMsConnTask_0 The Reassociation Request from the client comes with 1 PMKID
Mar 16 19:35:58.490 *apfMsConnTask_0 WLC cannot find a valid PMKID to match the one provided by the client. However, if the client performs OKC and not SKC, the WLC computes a new PMKID based on the information gathered (the cached PMK, the client MAC address, and the new AP MAC address)
Mar 16 19:35:58.490 *apfMsConnTask_0 Client is entering the 802.1x or PSK Authentication state
Mar 16 19:35:58.490 *apfMsConnTask_0 Client has successfully cleared AP association phase
Mar 16 19:35:58.490 *apfMsConnTask_0 WLC/AP is sending an Association Response to the client with status code 0 = Successful association
Mar 16 19:35:58.526 *Dot1x_NW_MsgTask_3 Client will be required to Reauthenticate in 43000
seconds
Mar 16 19:35:58.526 *Dot1x_NW_MsgTask_3 WLC/AP is sending EAP-Identity-Request to the client

 



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Scott Fella
Hall of Fame
Hall of Fame

Just to add, you should always run a diff between the old configuration and the post upgrade configuration.  This will show you what might have been added or something that might have been disabled or set back to default.  Hopefully you have a backup config that you can run diff against.

-Scott
*** Please rate helpful posts ***

Rich R
VIP
VIP

And a bug which @Leo Laohoo pointed out to me https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwe07802 which is fixed in the next maintenance release due out in the next week or two - ask TAC about that. (what AP model are you seeing this on?)

And of course as others said make sure your WiFi drivers are updated to the LATEST version.  I say that because quite often people say "my drivers are up to date" (because Windows update hasn't offered anything new) but the driver they're using is 2 years older than the one on Intel web site.  So if it's Intel then look at https://www.intel.com/content/www/us/en/download/19351/windows-10-and-windows-11-wi-fi-drivers-for-intel-wireless-adapters.html for example.  (The earlier versions of those drivers are riddled with bugs)

i think this is the bug which is affecting. Issue is intermittent and randomly coming and pointing to EAP authentication.

let me work with TAC for next fix release

TAC should be able to give you a copy of the latest beta if you're willing to test it.

Mikulasik
Level 1
Level 1

I have this bug too, seems to affect all devices, but is not consistent. Issue occurred over the weekend. 

My question would be, did you run into this issue because you upgraded or did you finally notice that you were having user issues?  There will always be bugs and the biggest take back is if you upgrade and users finally tell you that wireless sucks after a few weeks or months, then revert back.  Users tend to find their fixes or let's say work arounds until it becomes a pain in their rear ends.  I have done so many upgrades with testing and you will always run into one upgrade that bites you in to butt.  The best way is not to wait for a fix and then upgrade to find out it's still broke or another issue happens, revert back and do further testing.  At the end of the day, you can't blame the vendor for a bug, because management will always look at the person or team that made the change.  

-Scott
*** Please rate helpful posts ***

We were on 8.10.170 since it came out and ran into this issue on Monday (hmm DST happened Sunday). Upgraded to 183 based on TAC advice, no change. My debugs look exactly the same as the OP, the bug is logged this week.  I have a 3504 controller with users hittings the same NPS server policy with no issues, but it runs 8.5. Why it would run fine for about a year, then screw up like this, I don't know, but at this point it must be a WLC bug. 

Things just don't break.  You need to look at patches on the Windows device that can also tend to break things.  Upgrade of NIC firmware can also introduce issues.  So you have to go back a month or so and see what was pushed and try to isolate the issue. New devices can also look like something just broke, but a bunch of users just got their laptop refreshed.  Its best to gather data on the devices though some device management management system that can help with you correlating NIC model types and firmware along with patches to see what might of caused the issue.  In all case, take time to reboot the controller or fail it to another controller to see if the issue goes away.  Even though the controller seems okay, it just might not be.  I have seen that too many times, just like folks whom never shut down their laptops and eventually its slow, has issues connecting ,etc.

-Scott
*** Please rate helpful posts ***

The root cause was Azure fragmenting and delivering packets out of order from the NPS server. We needed to get Azure to enable UDP Fragment reordering as this behavior is by design.

https://github.com/MicrosoftDocs/azure-docs/issues/69477

I ran into this also a few months back and keep in mind that Azure engineer will enable this on an Azure virtual network for the subscription.  If you have multiple rescue groups and need this feature, you will need to request them to enable this flag.  If you create a new virtual network gateway, you will need to open a ticket to have them enable this flag.

I saw issue with ISE in Azure with only EAP-TLS and fragmentation when using an OTA capture.  

https://community.cisco.com/t5/network-access-control/eap-tls-to-azure-ise-is-failing-but-not-with-an-ise-node-in-the/td-p/4739038

-Scott
*** Please rate helpful posts ***

JPavonM
VIP
VIP

Please look for clients where OS and/or drivers have been upgraded like @Scott Fella said, if something has been working consistently during the last months, and failures have appeared to all clients with a set of specifications (Intel on this case) look for  the problem on that side.

I'd recommend you to subscribe to Intel communities where you can post the errors and work with Intel engineers into tracking down the issue and possibly fix it. In parallel, othe wNIC vendors do have known connectivity and performance issues under Windows such as Realtek and Mediatek so look always for the most up-to-date driver in MS Catalog Update, there are some scripts that do this for you only for drivers, search for them in Google.

I'd entertain assuming it was just Intel if it wasn't the same behavior on Apple and Android devices.

We are currently investigating an Intel-related wireless NIC driver issue where the NIC would drop association if the SSID is configured for WPA2 Enterprise.  Dropouts with PSK will also occur but not as frequent with WPA2 Enterprise.

The matter first observed after a large fleet of ChromeBooks (CB) were having irregular dropouts.  We brought this issue with Google and Google tapped Intel.  Intel confirms issue with the NIC drivers. 

According to Google, the issue is due to the GTK regeneration where the driver is unable to handle.

We suspect all drivers, up to 22.150.3 are affected.

Review Cisco Networking for a $25 gift card