Great information!

 

With the exception of the custom code, that is exactly how we are planning to set it up for our future systems.

It’s reassuring to head that it seems to work well for you.

 

Regards,

Einar

 

Not a solution per say, but I can give you some info on how we solve the reliability issue in our product that uses RAUC.

 

* We store the env at a raw offset in the eMMC (this should work for SD as well) rather than on a FAT partition as a file. You will need to set your partition table up to leave room for this and modify the U-Boot config.

* We use redundant u-boot environments placed in different sectors of the eMMC. This is a built-in feature of U-Boot that can be enabled in the config. If one gets corrupted it will fall back on the previous gracefully.

* We have custom code both in U-Boot and in Linux that checks for corrupt or inconsistent RAUC U-Boot environment vars. If they are totally out of whack we will boot into our fail-safe recovery mode where the evn vars are reset to a sane default and an update can be performed (no RMA needed).

 

Over the past year we've had this setup. I haven't once seen or heard of actually hitting a corrupt U-Boot env in any of our development units. We unfortunately don't have analytics around this event in the field.

 

I know this isn't exactly an answer to your question, but hopefully some of this helps you arrive at a robust solution for your setup.

 

Best,

~Matt

 

On Sun, Mar 28, 2021 at 6:11 AM Einar Vading <Einar.Vading@rhimagnesita.com> wrote:

> Hi,

>

> On Fri, 2021-03-26 at 05:48 +0000, Einar Vading wrote:

> > > > Hi,

> > > >

> > > > On Thu, 2021-03-25 at 15:22 +0000, Einar Vading wrote:

> > > > > We have a Raspberry Pi 4 system set up using RAUC for updates and u-boot

> > > > > for

> > > > > booting. For some systems in the field we have the u-boot environment on

> > > > > the

> > > > > FAT boot partition and we mount that in fstab so that RAUC can access it

> > > > > with

> > > > > the fw_print/setenv commands.

> > > > >

> > > > > One issue we have seen is that the env-file gets corrupted every now and

> > > > > then.

> > > > > After corruption we can't RAUC update. The only solution we have to this

> > > > > problem now is to delete the corrupted env-file and reboot, then we can

> > > > > perform the upgrade.

> > > > >

> > > > > I have no idea how to track down whatever corrupts the file and I was

> > > > > wondering if anyone has any input.

> > > >

> > > > You could try placing the environment on a separate partition to avoid any

> > > > potential issues in the FAT implementation. Also, I think U-Boot has a way

> > > > to

> > > > support redundant environments.

> >

> > I have just done this for our newer systems. I moved the GPT partitions back

> > 4MB and placed two redundant environments between the GPT and the first GPT

> > partition.

> >

> > It is my understanding though that redundant environments are not supported

> > when storing the env on FAT?

>

> That's probably a question for the U-Boot mailing list. :)

>

> > > Exactly. This should also be documented in the U-Boot integration guideline

> > > for eMMC:

> > >

> > >

> > >

> > > When writing to the FAT very short before hard rebooting, I could imagine

> > > this

> > > can lead to failures. Do you see the corruption only after updates, or also

> > > suddenly after n boots?

> >

> > Yes, this is something we have been able to test. If we cut the power

> > precisely when the env is written to FAT we can corrupt the entire boot

> > partition.

> > Super scary but this is not the problem we're seeing in the field. That

> > problem is more subtle.

>

> It should be possible to mount fat with the 'sync' option, but I'm not sure if

> that would help in this case. I'd recommend avoiding mounting FAT filesystems

> R/W if possible.

 

Maybe it could help with the problem I'm investigating. Don't think it would help with

the total corruption on powerloss when writing u-boot env, since that is in u-boot and

the fs is not "mounted" yet.

 

> > > How does the system report the corruption?

> >

> > fw_printenv and fw_setenv stops working and says that the env is corrupted.

> > That also means that RAUC update fails, that is usually when we notice it.

> >

> > Is there a way to watch a file and record any process that modifies it?

>

> There is blktrace, but you don't see the contents that way. It still may be

> enough detail to understand what's happening here.

 

Great, I'll check that out.

 

> Regards,

> Jan

 

Thanks for all the help.

 

Regards,

Einar

 

_______________________________________________
RAUC mailing list


 

--

Matthew Campbell

Principal Engineer

 

iZotope, Inc.