[pvrusb2] 24xx hardware instability.

Fri Mar 31 19:49:36 CST 2006

On Fri, 31 Mar 2006, Martin Andrew Galese wrote:

> Hi Mike,
>
> I wanted to let you know that I'm still seeing occasions where the
> cx25840 gets wedged. The timing seems to be random, but definitely is
> related to the number of "shows" recording by mythtv. The recording time
> doesn't seem to matter. As in, I can reliable record a continuous 6 hour
> show, but I can only actually record a series of 6, 1 hour shows, 10-15%
> of the time. After 3 distinct recordings (and almost never before) there
> seems to be a good chance that the cx25840 wedges.
>
> I had thought this was fixed with the mpeg2 garbage filter, but that
> only seemed to make the device somewhat more stable.
>
> Has anyone else had this issue? Have you seen it?
>

Martin:

The remaining problem I have seen to-date in my testing with the new 
hardware only happen when the hardware is first initialized.  In my case 
there's a definite probability that the cx25840 module fails to detect the 
cx25843 chip.  Unfortunately after that happens (and if you have the 
msp3400 module in your system) then msp3400 might come along and falsely 
detect the cx25840 as an msp3400.  Then the msp3400 module goes batty 
(because obviously this is not an msp3400 chip), generates lots of noise 
in the log and then fails.  And when it fails, frequently the kernel gets 
corrupted - in fact at that point the outward behavior is the same as what 
we just had to deal with involving the old hardware.  (In the "old" 
hardware case, the msp3400 failure cause was different but the endgame was 
the same.)  And before anyone (i.e. Hans) asks: No, I do _NOT_ know if 
msp3400 is the cause for the kernel corruption.  That needs to be chased. 
All I can say is that the only times I have found the kernel corrupted 
here have been after the msp3400 module fails.

Normally msp3400 should never falsely detect a cx25840 as an msp3400. 
The msp3400 module does its detection by looking for revision info from 
the chip, and under normal circumstances it won't get that info from the 
cx25843 chip and we're fine.  However in the scenario I'm seeing, the 
cx25840 module is failing to detect the cx25843 chip because the cx25843 
chip is spewing garbage data back to the host (usually 0x04 or 0x0a) for 
any subaddress that is probed AND that unfortunately appears to msp3400 
like a valid revision so then msp3400 comes into the fracas and really 
screws things up.

The behavior of msp3400 going nuts and (apparently) corrupting the system 
is collateral damage after the initial problem has happened (and it's the 
same collateral damage from the bug I just fixed for the old hardware). 
The root cause here involves figuring out why the cx25843 chip is spewing 
garbage.  I've already spent several days chasing that so far without 
success.  This is the last real problem I know of involving the new 
hardware - everything else I understand.  I will revisit this problem 
after I finish dealing with issues surrounding getting the driver into the 
kernel.  FYI, there is also an issue getting wm8775 to detect correctly 
(which is why that "force=-1,27" option is needed) but I already know a 
clean way to fix that; I just need to implement the fix.

Anyway, I'm stating all this so you can examine the problem you're seeing 
any maybe find some common ground.  Note: If you manually modprobe cx25840 
into the kernel with the option "debug=1" then you'll get useful info in 
the log reporting that module's status wirh respect to the hardware.

Another trick you can do with the driver to help diagnose problems is just 
simply to do this:

   cat /sys/class/pvrusb2/sn-xxxx/debuginfo

(Replace "xxxx" with your device serial number.)  When you issue that cat 
command two things will take place.  First, you'll get a compact dump to 
stdout reporting information about each I2C client module (e.g. 
cx25840.ko, msp3400.ko, saa7115.ko, tuner.ko, etc) that has attached to 
the driver.  Second, this action will also trigger a LOG_STATUS request to 
all attached I2C modules, and typically modules will respond by dumping a 
blob of status info into the kernel log.  You can do that cat command at 
_any_ time; it is a non-destructive action.  It's a quick 'n easy way to 
get the pulse of the driver.

So far the problem I describe _only_ happens when the hardware is first 
initialized.  Once it is successfully initialized then it is stable from 
that point forward (until you replug the device, power cycle, reinsert the 
driver, etc).  Sounds like the problem you are describing can happen long 
after the hardware has been initialized.  That's new behavior. 
Admittedly I haven't tried any long duration burn-in tests with the new 
hardware yet so maybe I just haven't seen this.  I'll keep my eyes open 
for it though.

It would valuable information to learn if someone else is seeing this. 
However don't anyone treat this as a "request" yet since I actually 
haven't yet gotten around to officially documenting how to use the driver 
with the new hardware :-)

   -Mike

-- 
                         |         Mike Isely          |     PGP fingerprint
      Spammers Die!!     |                             | 03 54 43 4D 75 E5 CC 92
                         |   isely @ pobox (dot) com   | 71 16 01 E2 B5 F5 C1 E8
                         |                             |