13 October 2015

How to Troubleshoot Random Flashing in E1.31 Controllers

From time to time we will have customers that report "flashing" or brief periods of interruption of light output or stuck lights/pixels on E1.31 controllers they are using.  While we can't provide a specific answer to the question because there are just too many variables, this article hopefully can provide you some concrete troubleshooting steps to find a resolution to the issue.

E1.31 or sACN (or also Artnet) is just but one part of a complex stack of interconnected systems that take the output from your sequencing application to your controller, so let's look at each layer in this stack and talk about possible causes and troubleshooting steps you can take to eliminate the problem.

Lights (Pixels)

We are starting at the very bottom of this stack - lights.  Most E1.31 controllers have built-in test functions in the firmware of the controller that will output a test pattern (for AlphaPix controllers see this video) that can be invoked via a button or buttons on the controller.  Why this is useful is to isolate the two major parts of the stack - the controller from PC and to determine if the problem is in the hardware.  Start the hardware test function, ideally with a "full white" output and watch for any signs of the flashing.  If flashing occurs at the hardware level - most likely the problem is controller, lighs or power based - look at the following as possible problem sources:

  • If flashing occurs only when a lot of lights or when full white or other high draw output is happening but the flashing does not occur with a single color, the likely cause could be insufficient power for the number of lights running.  
  • In minor cases of under-powered lights, individual pixels can receive insufficient power and once they drop below a certain voltage, they fail to function.  This most often will show up at the end of a string of lights.
  • In major cases of under-powered lights, the power supply will cut off due to an over-current situation, usually for a few seconds at a time.  All lights will turn off that are connected to the power supply.
  • If you suspect any of the above, put the lights on all-white output on all pixels and then using a multi-meter, measure the voltage at the end of the string of pixels.  If you notice large drops, there may be a power issue.

Networking

Because E1.31 controllers are network devices on an Ethernet network, it is possible the network itself is inducing delays or lost packets of data.  Areas to consider are:
  • If using ANY wireless - remove the wireless components (bridges, extenders, access points, etc) and move back to a wired connection for testing to ensure that the wireless network is not the source of the problem.  On a secondary note - we do not recommend using wireless networks unless there is no good alternative to good ol' CAT5 cable.  Wireless can induce delays, suffer higher packet loss and be subject to other interference on the network, more so than an wired network.
  • If the PC you are using has more than one interface / network, such as a wireless interface plus a wired interface, disable all other Ethernet interfaces on the PC.  We have seen cases were routing problems within the PC resulted in delays of traffic going to the network where the E1.31 controller was located.
  • If the controller is not directly connected to the PC - such as hooked to a local router/switch/hub - where possible, hook the cable directly from the PC's Ethernet network plug directly to the plug of the E1.31 controller.
  • Try running a PING to the controller continuously to confirm that there are not lost packets.

PC / OS

This area is where we see a fair amount of the problems with random flashing, mainly with one thing - firewall applications.  Most PC's today have a firewall (Modern Windows PC's all have one) and often they have third party firewalls such as Symantec or AVG.  These applications constantly monitor or "inspect traffic" going out of and into your PC and when they sense something is dangerous - rightfully or not, they block that traffic or do a "deep inspection" of the data in an effort to determine if it is dangerous, sometimes inducing delays.  The problem with this is that if an application is "tinkering" or outright blocking the traffic, this can result in delays in the data arriving or arriving out of order, resulting in random output which could be seen as blinking.  When troubleshooting, we recommend disabling your firewall - most have a "disable for 15 minutes" function that turns off the blocking allowing you to take the firewall out of the "loop".

A note for customers using on PC based items such as the FPP - if you are experiencing problems when sourcing data from the FPP, switch to a PC and test from there.

A note for customers using virtualized environments such as Window in a Mac - we recommend testing from a dedicated PC.

Additional E.131 Applications

It is common for people sequencing E1.31 controllers to have several E1.31 based applications on their PC - applications like xLights and LOR.  What most people don't realise is that these applications, even when not running a sequence or test, are outputting E1.31 data to the controller.  The primary application we see the most issues with is LOR's 'Control Panel', which if configured and open is always outputting E1.31 data even with the LOR Sequence Editor closed.  So, if you are using more than one application, make sure to close or "unload" the other application so that it doesn't step on the data or send two streams of data to the controller, causing confusion to the controller, at the same time.  To unload LOR, right click the Light Bulb icon in the system tray, then select "Unload Light-o-Rama".

Sequencing Application

The final and highest layer in the stack is of course - the sequencing application.  This is where the data originates from to make your show.  There can be a wide range of possible issues to consider with sequencing applications:
  • Application flaw - Some open source applications are written with updates on a weekly basis and this can induce bugs that result in un-predictable output on something that on screen appears to be fine.  If a sequence worked last week and now exhibits a problem and the hardware configuration has not changed, you may wish to roll back to the old version of the application and re-test.
  • Incorrectly configured sequence - If you experience a problem with a sequence, "fall back" and create a new, simple sequence that contains no imported or migrated information and only a limited amount of sequencing necessary to replicate the problem being experienced.
  • Sequence overloading PC - Since sequencing applications and displays are now in the tens of thousands of channels, it is possible that your sequencing application is generating more data than can be properly output to your network going to the E1.31 controller(s) from the PC you are using.  Falling back to a simple test sequence or removing controllers from a sequence can help to narrow this problem down.
Where possible, you may wish to try using a completely different application for E1.31 data output for testing.  Our 100% flaw free and go-to application after all these years is still xLights 2012b - no, not the new "xLights/Nutcracker" 4.x but the older, simpler and super clean application from years back.  The 2012b version of the application can be found on this site:  http://sourceforge.net/projects/xlights/ for free download and here is a video that shows how to use it for testing:


If your problem goes away with xLights or another application but still occurs in your original sequencing application, likely the problem is with that application.

Last Ditch...

In some very rare cases we at HolidayCoro need to determine is this a controller and/or network issue or a PC/Software issue, we have to pull out WireShark.  WireShark is a free application that allows you to look directly at the data being sent to the network and then to the E1.31 controller.  From here you can see any mal-formed packets or other errors going to the controller.  With enough effort, you can then compare the packets and see if the problem is controller or software.  In one case where our AlphaPix controller was blinking with LOR software while the LOR control panel was running, this was necessary and shows the value of such low level troubleshooting:


We should note that it is exceeding rare to have to use tools like this and most of the time your vendor should be able to assist you prior to getting to this level.

Firmware

While the E1.31 protocol is a fairly simple thing, it is always possible that there is a flaw in the vendor's firmware.  If you suspect this, first update the the latest version of the firmware provided by the vendor, then if that does not resolve the problem, try general troubleshooting such as moving around outputs - switching universes mapped to different outputs to see if the problem "follows" the output or the universe.

If you suspect a controller issue, we recommend providing your vendor with the following:
  • A detailed list of steps taken to narrow the problem down.  Where possible, always use specific references such as "output 1" instead of "the output".
  • Provide a clear overview of how the controller is configured (channels, universes, etc) in addition to what items are connected to the controller (pixel counts, etc)
  • Provide a copy of the sequences used to test and reproduce the output
  • Provide details of your network configuration
Remember, the vendor has zero understanding of your specific setup and testing that has been completed and for them to provide a solution in all but the most basic of situations, detailed information will be required.


Conclusion

We won't kid you - E1.31 while itself being quite simple, is made much, much more complex due to the large number of variables and layers involved with getting data from one end of the stack to the other.  When troubleshooting these problems, it always helps to start at the basics, then work your way up from there.  Never assume anything is "ok" until it proves to you through testing that it is working.