Re: HyperFlex cli hangs on certain commands

biscuit-datacentre · ‎07-04-2020

Hi,

I am having some trouble obtaining information from one of the SCVMs (the SCVM with the mgmtip currently) in my hyperflex cluster.

There's definitely something up as I see a Server Call Failed error in HXConnect, the dashboard shows all is healthy but there is no node info. The cluster is up and running and all seems fine in vCenter so I don't want to do anything disruptive.

I have been through all the manual checks of the HyperCheck as I am pretty sure the automated one won't work due to the following:

# service_status.sh returns 'Springpath File System ... Running' but nothing else, it just hangs until a CtrlC out.

# stcli cluster info also just hangs and doesn't return anything, nor does # stcli datastore list.

All the other checks come back OK, this node with the issue is the Zookeeper leader.

I have ran pidof for the other services and they come back with a PID.

None of these symptoms occur on the other two nodes.

I'm just wondering if any of you have come across anything similar and if I'm just missing a simple process or service. I realise there could be a wealth of causes but thought I'd throw it out to the community first.

I should point out that I didn't install this so it may have been running like this for some time without anyone checking. After all, if VMs are up and providing service, folk just carry on sometimes.

Happy to elaborate or provide sample output if anyone has any thoughts. Thanks!

Steven Tardy · ‎07-04-2020

Some management services aren't working properly and likely need to be restarted.

Unfortunately there isn't a `restart management-services` command. ):

Which exact API call is failing?

Check browser dev-tools / [Network] tab;

Which exact web requests timeout or return 4xx or 5xx errors?

Could dig further into the management services by looking at the code for service_status.sh:

less $(which service_status.sh)

Should open a TAC case and let TAC take a look at the services under the hood.

biscuit-datacentre · ‎07-07-2020

Thanks @Steven Tardy ,

I've raised a TAC case and so far been advised to run stcli cluster reregister to fix the HX Connect 'Server Call Failed' error.

For ref, dev-tools shows the following generic errors:

Failed to load resource: the server responded with a status of 400 (Bad Request) /#/clusters/1:1 Uncaught (in promise) {"status": "bad"}

/hx/api/clusters/1/nodes:1 Failed to load resource: the server responded with a status of 400 (Bad Request) /hx/api/clusters/1/messages:1 Failed to load resource: the server responded with a status of 400 (Bad Request) /#/clusters/1:1 Uncaught (in promise) {"status": "bad"}

/hx/api/clusters/1/disks:1 Failed to load resource: the server responded with a status of 400 (Bad Request)

Once I've done the reregister I'll see what's next for the service issue.