question

pcurtis avatar image
pcurtis asked

Venus OS large periodically becomes unresponsive and needs restart when using Tethered connection

I have recently started having a problem that the system has become unresponsive 4 times over the last three weeks. It had a Raspberry Pi 3B+ running Venus OS large v2.80~33-large-24 during this period . My system is on a narrowboat and has a SmartSolar MMPT 100/20, SmartShunt and Phoenix Smart Inverter 24/1600 and the Wifi and internet access is provided by a dedicated tethered Android 4g phone. All the incidents were very similar in that:

  • The VRM portal stopped updating and all access was lost to node-RED
  • SSH Access was lost to the Pi on the network whilst other devices connected and worked including internet access
  • Victron Connect could see the Raspberry Pi using Bluetooth but not connect
  • Wifi in the Raspberry still seemed to be connected to the local network but even a ping did not get a response.
  • It may be a coincidence but I previously had several months operation without a single access problem with Venus OS 2.73 or 2.80~21-large-23

Initially I thought the OS was completely dead but I found some log files in /data/log and there are major changes at the time the portal updating stopped in the messages log files but it is not obvious to me what has happened. There are no actual errors showing and the kernel still seems to be running. I am not sure where to look for further information and am not familiar with the log file structure in the Venus OS.

I have attached the last two log files, (I had to add the .txt before it would accept them). The last portal update was at 16:01 on 20/01/2022 which is in messages.0 at about line 991. I also have a clone of the SD taken on a separate machine before I restarted the Raspberry Pi 3B +

The above was posted as a potential problem with the Venus OS large v2.80~33-large-24 here as I previously had several months operation without a single access problem with Venus OS 2.73 and then 2.80~21-large-23.

@mvader (Victron Energy) pointed out that the log file showed WiFi issues. For example:

Jan 16 13:13:48 raspberrypi2 daemon.notice wpa_supplicant[742]: wlan0: Trying to associate with SSID 'Big Blue'
Jan 16 13:13:51 raspberrypi2 daemon.notice wpa_supplicant[742]: wlan0: CTRL-EVENT-ASSOC-REJECT bssid=00:00:00:00:00:00 status_code=16
Jan 16 13:13:51 raspberrypi2 daemon.notice wpa_supplicant[742]: wlan0: CTRL-EVENT-SSID-TEMP-DISABLED id=1 ssid="Big Blue" auth_failures=1 duration=10 reason=CONN_FAILED
Jan 16 13:13:51 raspberrypi2 daemon.warn connmand[673]: Skipping disconnect of 42696720426c7565_managed_psk, network is connecting.
Jan 16 13:14:21 raspberrypi2 daemon.info connmand[673]: Connection Manager version 1.33

shows a connection failure followed by a restart of the connection manager (connman) and had nothing to do with Venus OS large in itself. He suggested this should be moved to the community area for further discussion.

messages.0.txt

messages.txt

Venus OS
messages.txt (92.6 KiB)
messages0.txt (100.1 KiB)
12 comments
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

I'm not a network expert, but first thing I would try is a different internet device (router, another phone etc.) and hook up your GX device to that. I can't recall where I saw it, but it's not the first time I read that a tethered Android connection is making problems.

From your log:

  1. Jan 16 13:13:51 raspberrypi2 daemon.notice wpa_supplicant[742]: wlan0: CTRL-EVENT-SSID-TEMP-DISABLED id=1 ssid="Big Blue" auth_failures=1 duration=10 reason=CONN_FAILED
0 Likes 0 ·
mvader (Victron Energy) avatar image mvader (Victron Energy) ♦♦ Stefanie (Victron Energy Staff) ♦♦ commented ·

Indeed, and preferably use lan cable rather than wifi

0 Likes 0 ·
pcurtis avatar image pcurtis Stefanie (Victron Energy Staff) ♦♦ commented ·

@Stefanie Thanks for heads up on tethering problems. I have done search on tether in community and there are some references to problems but not much in common between them. I have also had weeks without issues before recent problems.

There are also mentions of powersaving modes being activated causing similar problems.

I hope I can find a way to do an automatic clean restart of WiFi without a full reboot although I do not understand why a WiFi problem has apparently locked up Victron services and Node-RED.

0 Likes 0 ·
mvader (Victron Energy) avatar image mvader (Victron Energy) ♦♦ pcurtis commented ·
Hi again, my advice: tethering from a phone is not designed to be reliable. I would simply skip that, get something serious (like the rutronik) and spend your time elsewhere.


With regards to the services locking up: I'm not convinced they lock up. I haven't seen any proof of that. The only thing I see in the logs are network issues.

My suspicion is they don't lock up; so still running; but unreachable because of network issues.

And with the combination being a rpi (so not a official Victron product) + wifi + tethering from a phone + only one such complaint, I have to disappoint you: that is not something we'll be looking into from Victron side.

0 Likes 0 ·
Show more comments
mvader (Victron Energy) avatar image mvader (Victron Energy) ♦♦ commented ·
Hi @pcurtis , thanks for moving it here. You write LAN, but I thought the GX was connected using WiFi?

To distinguish between a network issue and anything else, I recommend connecting either a screen + keyboard or a serial console cable.

0 Likes 0 ·
pcurtis avatar image pcurtis mvader (Victron Energy) ♦♦ commented ·
LAN corrected. Screen and keyboard not easy as system is on a boat which is unoccupied and problem only occurs every few days. I have to go to it for many tests. I have added an automatic reboot if node-RED stops which should help distinguish network and general problems. I will swap phones at some point.
0 Likes 0 ·
mvader (Victron Energy) avatar image mvader (Victron Energy) ♦♦ pcurtis commented ·
Try and make it forget a few wifi networks. Who knows it might solve your issue.


And if not that then at least it cleans up the system logs.

0 Likes 0 ·
Show more comments
Show more comments
2 Answers
pcurtis avatar image
pcurtis answered ·

I think it is time to report the progress I have made before I update to Large 2.82 and add another variable. It will bring together the many helpful replies and comments I have received which ended up so deeply nested that they have become difficult to follow.

I have now followed up many of these suggestions as well as my own thoughts and I have added a lot of diagnostics to my Node-RED dashboard which have helped narrow down the problem considerably:

  • Both the Venus OS and Node-RED are running, it is just the WiFi network interface which is down and all data is safely cached and uploaded after a restart to the portal.
  • I am using a tethered WiFi Connection from an Android Phone and there have been previous problems with such connections.
  • Other users have had rock stable mobile connections using dedicated mobile routers such as the Teltonika RUT240 connected locally via an Ethernet cable.
  • There is nothing to indicate it is related to use of Venus OS Large or particular versions, any differences I have seen are more likely to be related to the phones used.
  • I have switched phones but it is too early to say if that has an impact

This problem is unlikely to affect most other users but I have edited the title to contain Tethered to act as a warning to any investigating use of tethered phones.

Anybody with a installation then can not reach easily or needing high reliability should avoid tethered phone connections. They are not designed for such use. An Ethernet cable is also likely to be much more reliable for local connections. Occasional use of a tethered phone to upload to the VRM Portal should be fine.

In my case I am using it to do my development via the portal to avoid frequent visits to a cold boat. If it fails I only have a short walk and even if I can not reach the boat due to floods I can always switch my shore power off and back on to reboot!

I have however not given up completely on investigating further but it is a slow job. I have been through the previous log files and found I had uninterrupted operation with Venus OS 2.73 for 19 days with an old 3g phone and typical periods of 5 days more recently with a 4g phone and 2.80 large. So it is very much a waiting game. I have also found at least one or two cases where a restart seems to have occurred that I did not initiate with any of my own software watchdog code which is interesting.

The problems may well be caused by the 'tethered' connection to Android but I have to note that I been using similar shared connections using cables, Bluetooth and more recently WiFi for internet connect sharing back to 2003 and the days of USB modems, certainly long before tethering was used as a description (2010??)! I have used 2g Sony Phones, an XDA Exec running Windows Mobile, Blackberry phones, Edimax 6200n 3G Wireless routers, various MiFi boxes and most recently several different Tethered Android phones. I have done this for many months every year touring and sailing in NZ, on our Narrowboat in the UK and on Cruise ships. I have never had problems up to now and a quick check back showed about 60 GBytes passed through my tethered phone during last summers boating without a need to remake a connection even when frequent mobile signal dropouts occurred due to tunnels or just poor signal. The devices were however always Android or Linux Mint (Debian based devices) rather than an Openembedded based distributions (Dunfell in 2.80) which use connman and busybox.

I will report further when[if] I come up with anything useful and always welcome any comments or suggestions.

I am also happy to make my Node-RED diagnostic dashboard available - it is still under development but is getting steadily more useful. The Screenshot from 2022-02-09 09-31-39.png shows good margins on all parameters


2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

pcurtis avatar image
pcurtis answered ·

Update: It is now coming up to 300 hours since the last anomaly so it could be a long wait to see if all my extra diagnostics are useful. The only two changes that I can see are a change of phone for the tethering and that the ambient temperatures are much higher. The phone may be significant as it uses a much earlier version of Android and 3G connections whilst the new phone can handle 5G and may have extra network capabilities which are not being handled properly ip6??.

I have also noticed that in searches for tethering problems they are mainly with iphones which randomly disconnect whilst my hotspot has always remained live and usable by other devices to use. I have not updated to 2.82 to maintain continuity in the test.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

Related Resources