From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: MIME-Version: 1.0 References: <7a2fc0a9cb6bb54455d4cb69403a469e2fe832d8.camel@pengutronix.de> In-Reply-To: From: Matt Campbell Date: Sun, 28 Mar 2021 10:11:01 -0400 Message-ID: Content-Type: multipart/alternative; boundary="00000000000067829405be995640" Subject: Re: [RAUC] [NEWSLETTER]Re: Robust u-boot environment with RAUC To: Einar Vading Cc: =?UTF-8?Q?Enrico_J=C3=B6rns?= , "rauc@pengutronix.de" , "jlu@pengutronix.de" List-ID: --00000000000067829405be995640 Content-Type: text/plain; charset="UTF-8" Not a solution per say, but I can give you some info on how we solve the reliability issue in our product that uses RAUC. * We store the env at a raw offset in the eMMC (this should work for SD as well) rather than on a FAT partition as a file. You will need to set your partition table up to leave room for this and modify the U-Boot config. * We use redundant u-boot environments placed in different sectors of the eMMC. This is a built-in feature of U-Boot that can be enabled in the config. If one gets corrupted it will fall back on the previous gracefully. * We have custom code both in U-Boot and in Linux that checks for corrupt or inconsistent RAUC U-Boot environment vars. If they are totally out of whack we will boot into our fail-safe recovery mode where the evn vars are reset to a sane default and an update can be performed (no RMA needed). Over the past year we've had this setup. I haven't once seen or heard of actually hitting a corrupt U-Boot env in any of our development units. We unfortunately don't have analytics around this event in the field. I know this isn't exactly an answer to your question, but hopefully some of this helps you arrive at a robust solution for your setup. Best, ~Matt On Sun, Mar 28, 2021 at 6:11 AM Einar Vading wrote: > > Hi, > > > > On Fri, 2021-03-26 at 05:48 +0000, Einar Vading wrote: > > > > > Hi, > > > > > > > > > > On Thu, 2021-03-25 at 15:22 +0000, Einar Vading wrote: > > > > > > We have a Raspberry Pi 4 system set up using RAUC for updates > and u-boot > > > > > > for > > > > > > booting. For some systems in the field we have the u-boot > environment on > > > > > > the > > > > > > FAT boot partition and we mount that in fstab so that RAUC can > access it > > > > > > with > > > > > > the fw_print/setenv commands. > > > > > > > > > > > > One issue we have seen is that the env-file gets corrupted every > now and > > > > > > then. > > > > > > After corruption we can't RAUC update. The only solution we have > to this > > > > > > problem now is to delete the corrupted env-file and reboot, then > we can > > > > > > perform the upgrade. > > > > > > > > > > > > I have no idea how to track down whatever corrupts the file and > I was > > > > > > wondering if anyone has any input. > > > > > > > > > > You could try placing the environment on a separate partition to > avoid any > > > > > potential issues in the FAT implementation. Also, I think U-Boot > has a way > > > > > to > > > > > support redundant environments. > > > > > > I have just done this for our newer systems. I moved the GPT > partitions back > > > 4MB and placed two redundant environments between the GPT and the > first GPT > > > partition. > > > > > > It is my understanding though that redundant environments are not > supported > > > when storing the env on FAT? > > > > That's probably a question for the U-Boot mailing list. :) > > > > > > Exactly. This should also be documented in the U-Boot integration > guideline > > > > for eMMC: > > > > > > > > > > > > > https://rauc.readthedocs.io/en/latest/integration.html#example-setting-up-u-boot-environment-on-emmc-sd-card > > > > > > > > When writing to the FAT very short before hard rebooting, I could > imagine > > > > this > > > > can lead to failures. Do you see the corruption only after updates, > or also > > > > suddenly after n boots? > > > > > > Yes, this is something we have been able to test. If we cut the power > > > precisely when the env is written to FAT we can corrupt the entire boot > > > partition. > > > Super scary but this is not the problem we're seeing in the field. That > > > problem is more subtle. > > > > It should be possible to mount fat with the 'sync' option, but I'm not > sure if > > that would help in this case. I'd recommend avoiding mounting FAT > filesystems > > R/W if possible. > > Maybe it could help with the problem I'm investigating. Don't think it > would help with > the total corruption on powerloss when writing u-boot env, since that is > in u-boot and > the fs is not "mounted" yet. > > > > > How does the system report the corruption? > > > > > > fw_printenv and fw_setenv stops working and says that the env is > corrupted. > > > That also means that RAUC update fails, that is usually when we notice > it. > > > > > > Is there a way to watch a file and record any process that modifies it? > > > > There is blktrace, but you don't see the contents that way. It still may > be > > enough detail to understand what's happening here. > > Great, I'll check that out. > > > Regards, > > Jan > > Thanks for all the help. > > Regards, > Einar > > _______________________________________________ > RAUC mailing list > -- Matthew Campbell Principal Engineer mcampbell@izotope.com iZotope, Inc. www.izotope.com --00000000000067829405be995640 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Not a solution per say, but I can give yo= u some info on how we solve the reliability issue in our product that uses = RAUC.

* We store the env at a raw offset in the eMMC (th= is should work for SD as well) rather than on a FAT partition as a file. Yo= u will need to set your partition table up to leave room for this and modif= y the U-Boot config.
* We use redundant=C2=A0u-boot environments = placed in different sectors of the eMMC. This is a built-in feature of U-Bo= ot that can be enabled in the config. If one gets corrupted it will fall ba= ck on the previous gracefully.
* We have custom code both in U-Bo= ot and in Linux that checks for corrupt=C2=A0or inconsistent=C2=A0RAUC U-Bo= ot environment=C2=A0vars. If they are totally out of whack we will boot int= o our fail-safe recovery mode where the evn vars are reset to a sane defaul= t and an update can be performed=C2=A0(no RMA needed).

=
Over the past year we've had this setup. I haven't once seen o= r heard of actually hitting=C2=A0a corrupt U-Boot env in any of our develop= ment=C2=A0units. We unfortunately don't have analytics around this even= t in the field.

I know this isn't exactly an a= nswer to your question, but hopefully=C2=A0some of this helps you arrive at= a robust solution for your setup.

Best,
~Matt

On Sun, Mar 28, 2021 at 6:11 AM Einar Vading <Einar.Vading@rhimagnesita.com> wr= ote:
> Hi,
>
> On Fri, 2021-03-26 at 05:48 +0000, Einar Vading wrote:
> > > > Hi,
> > > >
> > > > On Thu, 2021-03-25 at 15:22 +0000, Einar Vading wr= ote:
> > > > > We have a Raspberry Pi 4 system set up using = RAUC for updates and u-boot
> > > > > for
> > > > > booting. For some systems in the field we hav= e the u-boot environment on
> > > > > the
> > > > > FAT boot partition and we mount that in fstab= so that RAUC can access it
> > > > > with
> > > > > the fw_print/setenv commands.
> > > > >
> > > > > One issue we have seen is that the env-file g= ets corrupted every now and
> > > > > then.
> > > > > After corruption we can't RAUC update. Th= e only solution we have to this
> > > > > problem now is to delete the corrupted env-fi= le and reboot, then we can
> > > > > perform the upgrade.
> > > > >
> > > > > I have no idea how to track down whatever cor= rupts the file and I was
> > > > > wondering if anyone has any input.
> > > >
> > > > You could try placing the environment on a separat= e partition to avoid any
> > > > potential issues in the FAT implementation. Also, = I think U-Boot has a way
> > > > to
> > > > support redundant environments.
> >
> > I have just done this for our newer systems. I moved the GPT= partitions back
> > 4MB and placed two redundant environments between the GPT an= d the first GPT
> > partition.
> >
> > It is my understanding though that redundant environments ar= e not supported
> > when storing the env on FAT?
>
> That's probably a question for the U-Boot mailing list. :)
>
> > > Exactly. This should also be documented in the U-Boot i= ntegration guideline
> > > for eMMC:
> > >
> > >
> > >
> > > When writing to the FAT very short before hard rebootin= g, I could imagine
> > > this
> > > can lead to failures. Do you see the corruption only af= ter updates, or also
> > > suddenly after n boots?
> >
> > Yes, this is something we have been able to test. If we cut = the power
> > precisely when the env is written to FAT we can corrupt the = entire boot
> > partition.
> > Super scary but this is not the problem we're seeing in = the field. That
> > problem is more subtle.
>
> It should be possible to mount fat with the 'sync' option= , but I'm not sure if
> that would help in this case. I'd recommend avoiding mounting= FAT filesystems
> R/W if possible.

Maybe it could help with the problem I'm investigating. Don't = think it would help with
the total corruption on powerloss when writing u-boot env, since that = is in u-boot and
the fs is not "mounted" yet.

> > > How does the system report the corruption?
> >
> > fw_printenv and fw_setenv stops working and says that the en= v is corrupted.
> > That also means that RAUC update fails, that is usually when= we notice it.
> >
> > Is there a way to watch a file and record any process that m= odifies it?
>
> There is blktrace, but you don't see the contents that way. I= t still may be
> enough detail to understand what's happening here.

Great, I'll check that out.

> Regards,
> Jan

Thanks for all the help.

Regards,
Einar

_______________________________________________
RAUC mailing list


--
Matthew Campbell
Principal = Engineer

iZotope, Inc.
--00000000000067829405be995640--