From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from de-smtp-delivery-102.mimecast.com ([62.140.7.102]) by metis.ext.pengutronix.de with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lQkEO-0003mN-Cw for rauc@pengutronix.de; Mon, 29 Mar 2021 07:14:30 +0200 From: Einar Vading Date: Mon, 29 Mar 2021 05:14:22 +0000 Message-ID: References: <7a2fc0a9cb6bb54455d4cb69403a469e2fe832d8.camel@pengutronix.de> In-Reply-To: MIME-Version: 1.0 Content-Language: en-US Subject: Re: [RAUC] [NEWSLETTER]Re: [NEWSLETTER]Re: Robust u-boot environment with RAUC List-Id: RAUC Project - Discussion List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============0592601626==" Errors-To: rauc-bounces@pengutronix.de Sender: "RAUC" To: Matt Campbell Cc: "rauc@pengutronix.de" , =?Windows-1252?Q?Enrico_J=F6rns?= , "jlu@pengutronix.de" --===============0592601626== Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_AM0PR08MB4580282B727C41779681EB73E57F9AM0PR08MB4580eurp_" --_000_AM0PR08MB4580282B727C41779681EB73E57F9AM0PR08MB4580eurp_ Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Great information! With the exception of the custom code, that is exactly how we are planning = to set it up for our future systems. It=92s reassuring to head that it seems to work well for you. Regards, Einar Not a solution per say, but I can give you some info on how we solve the re= liability issue in our product that uses RAUC. * We store the env at a raw offset in the eMMC (this should work for SD as = well) rather than on a FAT partition as a file. You will need to set your p= artition table up to leave room for this and modify the U-Boot config. * We use redundant u-boot environments placed in different sectors of the e= MMC. This is a built-in feature of U-Boot that can be enabled in the config= . If one gets corrupted it will fall back on the previous gracefully. * We have custom code both in U-Boot and in Linux that checks for corrupt o= r inconsistent RAUC U-Boot environment vars. If they are totally out of wha= ck we will boot into our fail-safe recovery mode where the evn vars are res= et to a sane default and an update can be performed (no RMA needed). Over the past year we've had this setup. I haven't once seen or heard of ac= tually hitting a corrupt U-Boot env in any of our development units. We unf= ortunately don't have analytics around this event in the field. I know this isn't exactly an answer to your question, but hopefully some of= this helps you arrive at a robust solution for your setup. Best, ~Matt On Sun, Mar 28, 2021 at 6:11 AM Einar Vading > wrote: > Hi, > > On Fri, 2021-03-26 at 05:48 +0000, Einar Vading wrote: > > > > Hi, > > > > > > > > On Thu, 2021-03-25 at 15:22 +0000, Einar Vading wrote: > > > > > We have a Raspberry Pi 4 system set up using RAUC for updates and= u-boot > > > > > for > > > > > booting. For some systems in the field we have the u-boot environ= ment on > > > > > the > > > > > FAT boot partition and we mount that in fstab so that RAUC can ac= cess it > > > > > with > > > > > the fw_print/setenv commands. > > > > > > > > > > One issue we have seen is that the env-file gets corrupted every = now and > > > > > then. > > > > > After corruption we can't RAUC update. The only solution we have = to this > > > > > problem now is to delete the corrupted env-file and reboot, then = we can > > > > > perform the upgrade. > > > > > > > > > > I have no idea how to track down whatever corrupts the file and I= was > > > > > wondering if anyone has any input. > > > > > > > > You could try placing the environment on a separate partition to av= oid any > > > > potential issues in the FAT implementation. Also, I think U-Boot ha= s a way > > > > to > > > > support redundant environments. > > > > I have just done this for our newer systems. I moved the GPT partitions= back > > 4MB and placed two redundant environments between the GPT and the first= GPT > > partition. > > > > It is my understanding though that redundant environments are not suppo= rted > > when storing the env on FAT? > > That's probably a question for the U-Boot mailing list. :) > > > > Exactly. This should also be documented in the U-Boot integration gui= deline > > > for eMMC: > > > > > > > > > https://rauc.readthedocs.io/en/latest/integration.html#example-settin= g-up-u-boot-environment-on-emmc-sd-card > > > > > > When writing to the FAT very short before hard rebooting, I could ima= gine > > > this > > > can lead to failures. Do you see the corruption only after updates, o= r also > > > suddenly after n boots? > > > > Yes, this is something we have been able to test. If we cut the power > > precisely when the env is written to FAT we can corrupt the entire boot > > partition. > > Super scary but this is not the problem we're seeing in the field. That > > problem is more subtle. > > It should be possible to mount fat with the 'sync' option, but I'm not su= re if > that would help in this case. I'd recommend avoiding mounting FAT filesys= tems > R/W if possible. Maybe it could help with the problem I'm investigating. Don't think it woul= d help with the total corruption on powerloss when writing u-boot env, since that is in= u-boot and the fs is not "mounted" yet. > > > How does the system report the corruption? > > > > fw_printenv and fw_setenv stops working and says that the env is corrup= ted. > > That also means that RAUC update fails, that is usually when we notice = it. > > > > Is there a way to watch a file and record any process that modifies it? > > There is blktrace, but you don't see the contents that way. It still may = be > enough detail to understand what's happening here. Great, I'll check that out. > Regards, > Jan Thanks for all the help. Regards, Einar _______________________________________________ RAUC mailing list -- Matthew Campbell Principal Engineer mcampbell@izotope.com iZotope, Inc. www.izotope.com --_000_AM0PR08MB4580282B727C41779681EB73E57F9AM0PR08MB4580eurp_ Content-Type: text/html; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable

Great in= formation!

 

With the= exception of the custom code, that is exactly how we are planning to set i= t up for our future systems.

It=92s r= eassuring to head that it seems to work well for you.

 

Regards,=

Einar

 

Not a solution per say, but I can give you some info on h= ow we solve the reliability issue in our product that uses RAUC.

 

* We store the env at a raw offset in the eMMC (this should work f= or SD as well) rather than on a FAT partition as a file. You will need to s= et your partition table up to leave room for this and modify the U-Boot config.

* We use redundant u-boot environments placed in different se= ctors of the eMMC. This is a built-in feature of U-Boot that can be enabled= in the config. If one gets corrupted it will fall back on the previous gracefully.

* We have custom code both in U-Boot and in Linux that checks for = corrupt or inconsistent RAUC U-Boot environment vars. If the= y are totally out of whack we will boot into our fail-safe recovery mode where the evn vars are reset to a sane default and an update= can be performed (no RMA needed).

 

Over the past year we've had this setup. I haven't once seen or he= ard of actually hitting a corrupt U-Boot env in any of our development=  units. We unfortunately don't have analytics around this event in the field.

 

I know this isn't exactly an answer to your question, but hopefull= y some of this helps you arrive at a robust solution for your setup.

 

Best,

~Matt

 

On Sun, Mar 28, 2021 at 6:11 AM Einar Vading <Einar= .Vading@rhimagnesita.com> wrote:

> Hi,

>

> On Fri, 2021-03-26 at 05:48 +0000, Einar Vading wrote:=

> > > > Hi,

> > > >

> > > > On Thu, 2021-03-25 at 15:22 +0000, Einar Vadin= g wrote:

> > > > > We have a Raspberry Pi 4 system set up us= ing RAUC for updates and u-boot

> > > > > for

> > > > > booting. For some systems in the field we= have the u-boot environment on

> > > > > the

> > > > > FAT boot partition and we mount that in f= stab so that RAUC can access it

> > > > > with

> > > > > the fw_print/setenv commands.

> > > > >

> > > > > One issue we have seen is that the env-fi= le gets corrupted every now and

> > > > > then.

> > > > > After corruption we can't RAUC update. Th= e only solution we have to this

> > > > > problem now is to delete the corrupted en= v-file and reboot, then we can

> > > > > perform the upgrade.=

> > > > >

> > > > > I have no idea how to track down whatever= corrupts the file and I was

> > > > > wondering if anyone has any input.=

> > > >

> > > > You could try placing the environment on a sep= arate partition to avoid any

> > > > potential issues in the FAT implementation. Al= so, I think U-Boot has a way

> > > > to

> > > > support redundant environments.<= /span>

> >

> > I have just done this for our newer systems. I moved the= GPT partitions back

> > 4MB and placed two redundant environments between the GP= T and the first GPT

> > partition.

> >

> > It is my understanding though that redundant environment= s are not supported

> > when storing the env on FAT?

>

> That's probably a question for the U-Boot mailing list. :)

>

> > > Exactly. This should also be documented in the U-Bo= ot integration guideline

> > > for eMMC:

> > >

> > >

> > >

> > > When writing to the FAT very short before hard rebo= oting, I could imagine

> > > this

> > > can lead to failures. Do you see the corruption onl= y after updates, or also

> > > suddenly after n boots?

> >

> > Yes, this is something we have been able to test. If we = cut the power

> > precisely when the env is written to FAT we can corrupt = the entire boot

> > partition.

> > Super scary but this is not the problem we're seeing in = the field. That

> > problem is more subtle.

>

> It should be possible to mount fat with the 'sync' option, bu= t I'm not sure if

> that would help in this case. I'd recommend avoiding mounting= FAT filesystems

> R/W if possible.

 

Maybe it could help with the problem I'm investigating. Don't thin= k it would help with

the total corruption on powerloss when writing u-boot env, since t= hat is in u-boot and

the fs is not "mounted" yet.

 

> > > How does the system report the corruption?

> >

> > fw_printenv and fw_setenv stops working and says that th= e env is corrupted.

> > That also means that RAUC update fails, that is usually = when we notice it.

> >

> > Is there a way to watch a file and record any process th= at modifies it?

>

> There is blktrace, but you don't see the contents that way. I= t still may be

> enough detail to understand what's happening here.

 

Great, I'll check that out.

 

> Regards,

> Jan

 

Thanks for all the help.

 

Regards,

Einar

 

_______________________________________________
RAUC mailing list


 

--

Matthew Campbell

Principal Engineer

 

iZotope, Inc.

--_000_AM0PR08MB4580282B727C41779681EB73E57F9AM0PR08MB4580eurp_-- --===============0592601626== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ RAUC mailing list --===============0592601626==--