question

Evgeniy Labunskiy avatar image
Evgeniy Labunskiy asked

Shelly add-on: Cerbo GX overload and as result lose BMS connection

I have a Cerbo running on the latest beta large firmware. From time to time (ones in a week)

I have a problem:

1. In VRM I see the connection problem banner

2. Same time I see a spike in dbus roundtrip time to about 50 sec (!)

3. Same time Cerbo shows a BMS connection loss error so the whole system goes offline. I think this problem happens due to system overload, BMS works correctly

Connection time drops, but to around 500ms and stays like that till the time I restart Cerbo

I connected using ssh and here is the log of running services:

I think this is might be related to Shelly 3EM dbus driver, but not sure about it.

What are your thoughts? Where to see logs?

cerbo gxMulti RS
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

6 Answers
nickdb avatar image
nickdb answered ·

First revert to a production release and see if it is resolved.

Remove the unsupported shelly module and see if that resolves the issue, most issues tend to be related to modifications loaded on the GX.

If these tests show it to be specific to the beta, then report that on the dedicated beta topic, not in a topic of its own as the beta team won't be regularly checking outside of that thread.

It is worth reading the beta testers guide pinned on Q&A.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

Evgeniy Labunskiy avatar image
Evgeniy Labunskiy answered ·

Temporary disable Shelly driver to see if it's helpful

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

jetlag avatar image
jetlag answered ·

I'm also intressted in this topic, as I observe more and more of these questions regarding "GX Overload" the last few weeks. Don't know the details of all, but I have also this behaviour, and also running the shelly integration from Fabian. But I had running the official VenusOS version 3.32 (not large). And I use a RasPi 4, that seem to be stronger than the original GX Device.

For me slowly it seem to be kind of systematic failure...

Can you please tell me how you display this "Gateway dbus..." graph? Maybe I could reconstruct this and check if it was the same issue in my setup?

6 comments
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

nickdb avatar image nickdb ♦♦ commented ·
Custom widget
1 Like 1 ·
Evgeniy Labunskiy avatar image Evgeniy Labunskiy commented ·

In VRM: Advanced -> Widgets -> Custom Widget -> Add new

You will find it under Gateway device fields

1 Like 1 ·
jetlag avatar image jetlag Evgeniy Labunskiy commented ·

Many Thanks @Evgeniy Labunskiy

I did never try this custom widgets.

Now I see that I had a similar problem when my system stopped and reported this "GX overload" message. This is only one spike in a whole month. The normal value is max 1-2ms. But I also see that such spikes occured a few times over the last half year, but only this one caused the system crash.

1720160242690.png

0 Likes 0 ·
1720160242690.png (12.4 KiB)
Evgeniy Labunskiy avatar image Evgeniy Labunskiy jetlag commented ·

I uninstalled Shelly and removed it from Cerbo. Now everything looks ok on my side. Looks like everything works ok. I also do not see any CPU overload or huge memory usage. Im thinking that in some conditions shelly integration generates tons of requests or something like this. WIll monitor futher


1 Like 1 ·
nickdb avatar image nickdb ♦♦ Evgeniy Labunskiy commented ·
There has been a lot of change to the Venus OS, so third party plugins really need to keep up with the constant development and continually test their code to make sure every integrates well and works as expected.

It might be useful to report this on their git page.

1 Like 1 ·
Show more comments
Evgeniy Labunskiy avatar image
Evgeniy Labunskiy answered ·

After removing Shelly 3EM dbus driver system is stable. I think it was a root cause

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

jetlag avatar image
jetlag answered ·

Thanks for the info.

For me this is a problem, as I can't reach the location where the energy meter is, by a cable. So I will stay with this topic, maybe if will occure only few times a year... or will become an update some day.

4 comments
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

Alex Pescaru avatar image Alex Pescaru commented ·

Hi @Jetlag, @Evgeniy Labunskiy

You can try, depending on your needs, to lower the rate the Shelly interrogate things and generate dbus values.

Do this by increasing the (500) milliseconds value in the timeout_add function.

gobject.timeout_add(500, self._update)

Although it's strange the amount of (huge) memory this driver is using sometimes... Something is odd.

Alex

1 Like 1 ·
Evgeniy Labunskiy avatar image Evgeniy Labunskiy Alex Pescaru commented ·
I think there is a memory leakage or something like this in this case, that's why it "eats" more and more memory
1 Like 1 ·
jetlag avatar image jetlag Evgeniy Labunskiy commented ·
Can the RAM size make a difference in this case? I assume it is the RAM that is blown up, right? I have a RasPi4 with 2GB RAM.
0 Likes 0 ·
jetlag avatar image jetlag Alex Pescaru commented ·
Thanks @Alex Pescaru!

I have first to dig into this topic, - I'm more the HW guy... ;-)

And so far I had this event/issue only one time. If it occurs more often, then I have to react on this.

0 Likes 0 ·
ektus avatar image
ektus answered ·

@Jetlag I don't believe RAM size is an issue here. Even after 73 days uptime and with 2 Shelly 3 EM active, both with the default 750ms poll interval, RAM usage stands at approx. 672MByte used and 357MByte free. I've seen "Cerbo overload" messages in VRM, but those would disappear as fast as they appear.

I've got two 3EM running and some node-red script to regulate my grid infeed and have been discussing my stability issues at length in https://community.victronenergy.com/questions/276890/cerbo-gx-random-reboots.html

My solution was increasing the watchdog threshold.

If the D-Bus round-trip doesn't recover, I'd look into network issues. Is the WiFi stable and signal strength sufficient? Is the Cerbo connected wirelessly or with cable? Mine is using a cable, so I've only got the WiFi link between access point and shelly (and the access point and the house network, as it's runnig as repeater).

1720788727244.png

Mem: 672308K used, 357776K free, 3072K shrd, 95976K buff, 160324K cached
CPU:  77% usr  19% sys   0% nic   0% idle   0% io   3% irq   0% sirq
Load average: 5.35 4.38 3.68 4/313 28473
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 1124  1117 root     R    26032   3%  15% {localsettings.p} /usr/bin/python3 -u /opt/victronenergy/localsettings/localsettings.py --path=/data/conf
 1097  1082 root     R     148m  15%  13% /opt/victronenergy/gui/gui -nomouse -display Multi: LinuxFb: VNC:size=800x480:depth=32:passwordFile=/data/conf/vncpassword.txt:0
 1965  1110 nodered  R     243m  24%  10% node-red
  822   820 messageb R     4572   0%  10% dbus-daemon --system --nofork
 1088  1071 root     S    41668   4%   8% {vrmlogger.py} /usr/bin/python3 -u /opt/victronenergy/vrmlogger/vrmlogger.py
28468 26412 root     R     2688   0%   8% top
 2052  2049 root     S    32596   3%   5% python /data/dbus-shelly-3em-smartmeter/dbus-shelly-3em-smartmeter.py
 1153  1138 root     S    22044   2%   5% {dbus_systemcalc} /usr/bin/python3 -u /opt/victronenergy/dbus-systemcalc-py/dbus_systemcalc.py
 1174  1156 root     S    21240   2%   5% {dbus_generator.} /usr/bin/python3 -u /opt/victronenergy/dbus-generator-starter/dbus_generator.py
 1529  2034 root     S    21028   2%   5% {vesmart_server.} /usr/bin/python3 -u /opt/victronenergy/vesmart-server/vesmart_server.py -i hci0
 2044  2038 root     S    66828   6%   3% /usr/bin/flashmq
 2051  2050 root     R    32272   3%   3% python /data/dbus-shelly-3em-inverter/dbus-shelly-3em-inverter.py
 1091  1073 root     S    20292   2%   3% /opt/victronenergy/venus-platform/venus-platform
 1176  1159 root     S    11372   1%   3% /opt/victronenergy/dbus-fronius/dbus-fronius
 1149  1132 root     S     8864   1%   3% /opt/victronenergy/hub4control/hub4control
 1121  1106 root     S     3176   0%   3% {serial-starter.} /bin/bash /opt/victronenergy/serial-starter/serial-starter.sh
 1187  1181 root     S    64136   6%   0% python /data/SetupHelper/PackageManager.py
 1165  1144 root     S    48628   5%   0% {dbus-modbus-cli} /usr/bin/python3 -u /opt/victronenergy/dbus-modbus-client/dbus-modbus-client.py
 2043  2036 root     S    42732   4%   0% {mqtt-rpc.py} /usr/bin/python3 -u /opt/victronenergy/mqtt-rpc/mqtt-rpc.py
 1099  1078 root     S    34192   3%   0% {venus-button-ha} /usr/bin/python3 -u /opt/victronenergy/venus-button-handler/venus-button-handler -D
 1154  1140 root     S    27708   3%   0% {dbus_shelly.py} /usr/bin/python3 /opt/victronenergy/dbus-shelly/dbus_shelly.py
 1130  1115 root     S    23944   2%   0% {netmon} /usr/bin/python3 -u /opt/victronenergy/netmon/netmon
  897   896 www-data S    22804   2%   0% php-fpm: pool www
  898   896 www-data S    22804   2%   0% php-fpm: pool www
  896     1 root     S    22740   2%   0% php-fpm: master process (/etc/php-fpm.conf)
 1177  1161 root     S    21660   2%   0% {dbus_digitalinp} /usr/bin/python3 -u /opt/victronenergy/dbus-digitalinputs/dbus_digitalinputs.py --poll=poll /dev/gpio/digital_
 1151  1134 root     S    19620   2%   0% {dbus_vebus_to_p} /usr/bin/python3 -u /opt/victronenergy/dbus-vebus-to-pvinverter/dbus_vebus_to_pvinverter.py
 1100  1084 simple-u S    13056   1%   0% /bin/simple-upnpd --xml /var/run/simple-upnpd.xml -d
  115     1 root     S    11672   1%   0% /sbin/udevd -d
 1095  1080 root     S    10220   1%   0% /opt/victronenergy/venus-access/venus-access
  948     1 root     S     9072   1%   0% /usr/sbin/wpa_supplicant -u -O /var/run/wpa_supplicant -s
26412 26386 root     S     8304   1%   0% -sh
 1954     1 root     S     8288   1%   0% -sh
  825     1 root     S     7592   1%   0% /usr/sbin/haveged -w 1024 -v 1
 1280  1123 www-data S     7272   1%   0% nginx: worker process
 1123  1112 root     S     6640   1%   0% nginx: master process /usr/sbin/nginx
13272 12789 root     S     5648   1%   0% ssh -o ExitOnForwardFailure=yes -o ConnectTimeout=20 -o ServerAliveInterval=10 -o ServerAliveCountMax=3 -o TCPKeepAlive=yes -o S
26386  1800 root     S     5524   1%   0% sshd: root@pts/0



1720788727244.png (53.6 KiB)
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.