question

johnjaymack avatar image
johnjaymack asked

Cerbo GX still rebooting multiple times a day even after the firmware update

I have a new as of February 2024 and now it is rebooting as often as every 15 minutes. The longest up time that I have seen is just under two hours.

Some background. I am running two multiplus II 24-2x120 in parallel. The system was very stable until this February. In February, I lost communication between the Touch 50 and the Cerbo GX. It was determined that the original Cerbo was malfunctioning and it was replaced under warranty.

As part of the replacement, the firmware of the Cerbo was updated to v3.31.

In January before the replacement and the updates, we spent two weeks boondocking in the desert. The daily generator running went perfectly.

After the update, I had to change a few connections to get the generator back online and working with the new warm up and cool down times available with the new software.

Now, because of the Cerbo rebooting, the generator will not run for two hours without shutting down at least twice due to the Cerbo rebooting. The Cerbo GX relay 1 picks to start the generator and drops to stop it. When the Cerbo reboots, the relay drops stopping the generator, I believe, and the relay picks again once the Cerbo is back online.

In searching other posts, it appears that this is a know issue. I have not found any report that it has been corrected. I did see that updating to current stable firmware was suggested as it might help. I updated today to v3.34. The Cerbo continues to reboot.

I have ssh into the Cerbo and run the top command. Before the firmware update, the cpu was typically running @34-45%. Now the v3.34 software is showing cpu 64-68%. screenshot-2024-07-11-104455.png"Reboot device when no contact" is OFF.

Any help or suggestions would be appreciated.


cerbo gx
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

19 Answers
Alex Pescaru avatar image
Alex Pescaru answered ·

Hi @JohnJayMack

Don't know if this has to do with your CPU time - but it may alright be - but something in your system - probably a node-red script - is doing a lot of updates on the local settings non-volatile-memory variables.

Node-red and localsettings seems to be the highest hit on CPU usage.

It may not be a wise thing....

Alex

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

Thanks for the response @Alex Pescaru. Most of the Node Red changes were working through through December and January.

Most of the Node Red scripts are involved in load shedding. That having been said, I have disabled the Node Red routines to see if this helps. What I see now in PID 1015 (localsettings.p line %cpu trending down from 5-6% down to 4-5% but sometimes going up to 8%.

Another suggestion was to check the voltage supply. I am currently monitoring the DC supply with my Min/Max meter. it has been holding steady at 28.2 volts.

Currently the Cerbo has been up for 59 minutes.


2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

Update, the Cerbo made it about 1 hour 10minutes or so before rebooting. At least it is consistent.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

Update. Just checked the voltage supplied. The voltage has not dropped lower than 28.2 during last couple of reboots.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

ektus avatar image
ektus answered ·

I've had frequent reboots myself.
To troubleshoot, connect to the Cerbo, issue the "top" command and keep the terminal window open. Wait for the reboot (no need to watch, just check from time to time) and if upon reboot the second value in the load average line is above 6, it's likely the watchdog was the culprit. I've increased the watchdog threshold from 6 to 10, eliminating reboots so far.

Discussion in https://community.victronenergy.com/questions/276890/cerbo-gx-random-reboots.html
Config file to change: /etc/watchdog.conf

Currently, mine looks like this:

log-dir = /var/volatile/log/watchdog
min-memory = 2500
max-load-5 = 10
max-load-15 = 8
repair-binary = /usr/sbin/store_watchdog_error.sh
test-prescaler = 60
retry-timeout = 0
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

I seem to be moving backwards in my problem, @ektus. This morning, I read your post and tried to do what you suggested. Now, the Cerbo will not permit a ssh login. It will accept the user name 'root', and the password, but then the cursor returns to the left of the screen, does not display the usual information, and does not respond to any other commands. If I enter the wrong password, it immediately rejects and requests again, which leads me to believe that recognizes a bad password. I have tried putty, kitty, and a Windows command line with the same results. I am not a linux guru, but I am also not a complete novice.

I have contacted my vendor and they will be sending out a replacement Cerbo, but I find it hard to believe that this Cerbo is absolutely the problem.

I am thinking that I will try another troubleshooting technique and I would appreciate your thoughts on what I intend to do.

I am thinking that I can disconnect the Cerbo from my system and bring it to the bench where the ambient temperature is around 75F/24C and not 115F/46C. This should eliminate an over temperature problem.

Second, having at hand should aid in figuring out what is wrong.

One of the issues that I am seeing is that moving from software v3.31 to 3.34 seems to have upped the cpu load by @38%. Not being a programmer, I am not sure, but it seems to me that something was added to the software that is really increasing the cpu load.

Your thoughts, or the thoughts of others would be most appreciated.


2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

ektus avatar image
ektus answered ·

My Cerbo is still running 3.31-3, which was current when I changed the watchdog setting. The only thing to alter is the values behind max-load-5 and max-load-10.

I don't have long-term experience (if any) with 3.34.

Do you have a screen installed on the Cerbo? If so, can you access the menu there? Or else, can you access it from VRM? There should be a factory reset option somewhere to start over.

If you can gain access again, the file /data/logs/messages might give some clues on what's happening.

46°C is quite high. But even so, it should still be within spec. Perhaps increasing air flow with a small fan could help. If it does work in the cooler environment on the bench, temperature or electrical interference might be the culprit. Perhaps try a 12V supply to reduce heat in the Cerbo´s power circuit?

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

@ektus, I would have to agree about the 46C being high, but considering the outside temp is 49C, I feel lucky that the compartment is lower.

My vendor is replacing the Cerbo GX. That having been said, they seem to think that I am some sort of "power user."

So here is the question that I put out for anyone to answer. Is the Cerbo GX capable of running Venus Os large and Node Red with several flows checking the power and load shedding if the power draw exceeds the power supply capabilities of my Multiplus II inverters, or should I be looking at a different Victron product?

Also, since I mention that I was trying to use a 'top' command, they said that I must be a 'super user.' I never thought that using the ssh login and reporting the results of the 'top' command made you a power user.

When I had access to the command line, I saw that the cpu % usage had increase from around 38-45% to 65-73%. That increase in cpu usage had to be due to Victron Stable software issues.

Finally, I am going to see if I can find the method to get my command line back. I tried cycling through various software versions, but while I can log into root with a password, after the password is accepted, the cursor returns to the left and then stays there until the next reboot.

I should mention that the Cerbo is mounted down in my RV basement to be near the other equipment.

I am currently running the Cerbo GX, two Multiplu II 24-2x120 inverters in parallel, a SmartShunt 500, a BMV monitor, and a Smart Solar MPPT 150/100. I also run two Orion DC-DC chargers, 12/24 and 24/12/ and two BSC chargers 12/30 and 24/15, but they are not connected to the Cerbo and should not be a cpu usage concern. I would consider trying to relocate the Cerbo but then I would have some very long, 35-50 feet as the wire snakes, runs through some crowded built in wire runways.


2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

ektus avatar image
ektus answered ·

@JohnJayMack I've got an ESS application (stationary, obviously), with 3x Multiplus 2/5000 inverters (3~ system), 5x MPPT 35/150 solar charge controllers, 2x Shelly 3 EM power meters, 1x Lynx shunt and 1x Pylontech battery system. So a total of 12 devices connected, more than twice as much as in your case. I've also got venus os large running and node-red with some flows. My Cerbo is struggling (rebooting frequently) with default settings, but has been running stable with the increased watchdog threshold.

If I were to start from scratch, I would probably either go for the Ekrano GX instead of the Cerbo, or for a Raspberry Pi 4 or above. But in a RV situation with high temperatures, the Pi would likely overheat quickly.

As for "power user": So what? Victron provides a modular DIY system, and the large OS variant is supplied by Victron themselves.

Trying to get into the command line again, does it work if you disconnect all devices but your computer and reboot the Cerbo? I've had occurrences where the Cerbo wouldn't boot properly. Some functions where running, but the screen would only display a bright square in the middle and SSH was not possible. That's all been discussed in other threads.

If it really is a system overload situation with the watchdog triggering, the replacement Cerbo won't be running any better. In that case, increasing the watchdog threshold might help, and/or reducing the computing load (run the Node-red flows less frequently, don't run them all with the same interval).

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

mvader (Victron Energy) avatar image
mvader (Victron Energy) answered ·

Hey @JohnJayMack , if you enable remote support, I can have a brief look why the system is rebooting.



2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

Hello @mvader (Victron Energy), just turned on Remote Support. Any help or suggestions much appreciated.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

I got a sign of life at the console.a-sign-of-life.png

I issued a top command and this is all I got.

I am also confused as to the time reported. I am in UTC +7, Pacific Coast Time Zone USA. The 'Last Login: Sun Jul 14 22:00:25' came up, but the actual time was 18:10 UTC + 7. Is this reporting the last time as 1400 hours because that was the last time before the current login? That would be about right.

The above for what it is worth.


a-sign-of-life.png (20.4 KiB)
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

Now the top command is working. I am always surprised when a computer heals itself.

From yesterday to today, I made one physical change. I demounted the Cerbo off of the wall and placed it out from the wall on a small fan yesterday afternoon. This should give much better ventilation. Today's high temperature was 113F, lower than the high this past week of about 120. I will leave the top command running.

screenshot-2024-07-14-200813.png


2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

mvader (Victron Energy) avatar image
mvader (Victron Energy) answered ·

Hi, I logged in and could not see a reason for the reboots. Which means its not cpu overload and this not the same issue that @ektus has been having.

What it is instead is something I don’t know.

The unfinished top command is a strange one as well. Perhaps that is due to a connection issue.

Localsettings being up there with 4% is also odd; I expect that that is related to something the node-red flow is doing.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

Ok, @mvader (Victron Energy), what I am doing with Node Red is monitoring the Input power versus output power to detect potential inverter overloads. I am also looking at inverter high temperature warning to shed load to prevent shutdowns and high current on line 2 when on a split phase power source.

In the case of an overload, I send a command to an http address to turn a Shelly relay off. After a delay, the Shelly is turned back on if the overload condition is gone. The same logic is used to respond to other issues.

I may be way off base, but the issue seems to be linked to the newer software. My first Cerbo just worked. I only replaced it because the hdmi port stopped working. As an aside, in December I had used several timer relays and two multiplus II assistants to implement Generator warm up and cool down periods. I would have preferred to have done it through the venus software, but it was not available at the time on the software that I was using. My design worked through January. Towards the end of January, as I remember, the connection between the Cerbo and the Touch 50 stopped working.

The new Cerbo was installed with the software updated and I removed my controls and began using the Venus software controls.

The next time I ran the generator on a once-a-month two hour run, it shut down twice in the middle of the run. I blamed the generator and thought that I might have overloaded it. I did not realize that the Cerbo was rebooting. I believe, and I admit that this is a guess, that since I use relay one on the Cerbo to start and run the generator, when the Cerbo reboots, relay one drops, and the generator stops. After the Cerbo come back online, the call for generator run is seen and relay one picks.

I have one more question about using Assistants on the Multiplus II unit. It is my impression that the Assistants on the Multiplus II units are controlled by the Multiplus themselves and do not represent a cpu load on the Cerbo. In other words, they are independant.

The reason that I ask is two fold. 1) My vendor suggested that the assistants might be interfering with the Cerbo. 2) When I implemented my warm-up and cool-down circuits, I used General Flag to Ignore AC input. Now I do not know how the Venus software has implemented this functionality, but Ignoring the AC input is logical for unloading the Generator.

I still have those assistants in the Multiplus programming, but I am not using them. By not using them, I think that they are not interfering with the Venus software.

I appreciate you looking in and am open to any other suggestions.


1 comment
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

Hi John,

What I can say is:

  • Assistants indeed run independent. They run in the MultiPlus itself, not in the Cerbo GX
  • For warm-up and cooldown there is dedicated functionality in the GX. So no need to program that yourself with either Node-RED and/or assistants.


Aside from that, I really can't help you. We can't look too deep into systems running Node-RED and doing things with Shelly and Assistants and so forth and so forth.

I hope you understand.

There must be a reason your GX reboots. And reading all the discussion plus having seen myself that its not a reboot initiated within the software, I'd say perhaps its a power supply / connection issue, or temperature, or something like that. However unlikely that may seem.


All the best, Matthijs

0 Likes 0 ·
johnjaymack avatar image
johnjaymack answered ·

Matthijs,

Thank you for your help. I completely understand your not delving into Node Red and those issues.

Your seeing that it is not software initiated helps.

Although I have monitored the power supply with a Min/Max meter and not seen a drop in voltage, I will trace the power supply feed verifying all connections.

As to the mounting in the basement of the coach, I do not have any good options without relatively long, 30 feet or so, runs of multiple wires in already crowded wireways. That having been said, I will investigate how hard it would actually be to move the Cerbo into the more climate controlled area of the coach.

As far as temperature goes, I would not be surprised in the slightest if temperature is a contributing factor. I will mount the Cerbo on stand-offs so that air can flow completely around the Cerbo.

The replacement Cerbo should arrive sometime this week. Once installed, I will post the initial results. Besides the top command information, is there any other information that would be useful?

Once again, thanks,


John


2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

ektus avatar image
ektus answered ·

@JohnJayMack there' a folder containing logs: /data/log/

Therein are files "messages" and and "messages.0" through "messages.5". Those are the system logs and might contain information on the reason of a reboot, if said reboot was triggered by software. If it was triggered by hardware problems (power supply, internal faults of any kind, thermally induced errors, whatever), it may or may not give a clue in there.

"messages" should be the current session and "messages.0" the previous one.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

@mvader (Victron Energy)

@ektus

Here is an update. I am not sure that the problem is resolved, but it does look hopeful.

First, I looked at the 'Message' files but I am not a programmer and nothing jumped out at me as pointing to a problem. I made sure that my power connection to the Cerbo was solid, 28.4 volts when the batteries are at float level which they have been at for the last 25 days since I am connected to a 50-amp shore power circuit, and check every other connection just to be sure.

I dismounted the Cerbo from the wall and placed it resting on a small fan so that I had very positive ventilation. None of these actions helped. The vendor did not send a replacement Cerbo and I have not reminded them yet. I did not feel that this was a Hardware issue based on my observations. If I am correct, then replacing the Cerbo will be unnecessary.

Avoiding updating the software and not achieving anything resembling success, I decide to update the software.

I followed the instructions in the Cerbo Gx manual for factory reset and reinstall of Venus OS software. (https://www.victronenergy.com/media/pg/Cerbo_GX/en/reset-to-factory-defaults-and-venus-os-reinstall.html). I put the Cerbo back into operation and have been monitoring it.

Before I did the re-install, I had been monitoring the Cerbo using a command: "while true; do date; sleep 1m; done;". What this command did is write a line once a minute displaying the current time and date. When the Cerbo would reboot, the command would stop working and I could see how long the Cerbo had run before rebooting. Before the software reinstall, the Cerbo would reboot anywhere from 15 minutes to just under 2 hours.

After the software reinstall, I have not see a reboot that I did not initiate.

I am in the process of moving the Cerbo inside my coach to a cooler location. This is a big project but I believe that a cooler location will be helpful.

One last observation is that the cpu percentage usage has dropped to between 11% and 28%. Prior to the update, the cpu percentage was running around 61%-68%. As far as I can tell, I have all my Node Red flows re-installed and running. I guess that at some point during the replacement of the original Cerbo, the software must have gotten corrupted or something. This is, of course, a guess.

I am running v3.40 Venus OS Large. While writing this post, I just saw a Notification that I have to update the firmware on the inverters which is a big deal since they typically shut off for a minute or so when the software updates. This drops the 110 ac in the coach which dumps my NAS causing other issue. I hope no one takes this wrong, but updating software, while necessary to get new features, is a major pain in the ***!

All for now.


2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

johnjaymack avatar image
johnjaymack answered ·

A final update. I decided to move the Cerbo GX to position inside of my coach where the ambient temperature should be lower than less than 85 F/ 30 C. Before I moved the Cerbo, if has performed without fail since I reloaded the Venus software.

I am running Venus OS Large and I have seen brief periods, reported by the top command, of 60% to 65% CPU usage with typical usage reported at 31% to 36%.

Relocating the Cerbo was a pain, but now that it is done, I am sure that the lower ambient temperature cannot but help the overall operation.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.