go-xsd-validate icon indicating copy to clipboard operation
go-xsd-validate copied to clipboard

Memory usage while validating

Open fabianem opened this issue 6 months ago • 5 comments

I am facing an issue with huge memory usage while validating large XML files (about 90 MB). I might be doing something wrong but there seems to be a memory leakage because the used memory grows after every file and never clears up until I restart the server.

My setup is an HTTP server with a handler that validates an uploaded XML file against a provided XSD.

My main func looks something like this:

func main() {
  // init xsdvalidate
  // xsdvalidate.InitWithGc(2 * time.Minute) // didn't help
  err = xsdvalidate.Init()
  defer xsdvalidate.Cleanup()
  if err != nil {
  log.WithError(err).Fatalf("could not init xsdvalidate")
  }
  ...
}

and inside my handler I am doing something like this:

func (it *service) handler(file io.ReadCloser, filesize int64) error {
  defer file.Close()
  
  xsdHandler, err := xsdvalidate.NewXsdHandlerMem(it.meteringObjectsXSD, xsdvalidate.ParsErrDefault) // it.meteringObjectsXSD is a small XSD file (3KB) read into a byte slice
  defer xsdHandler.Free()
  if err != nil {
    return err
  }

  fileContent := make([]byte, filesize)
  _, err = io.ReadFull(file, fileContent)
  if err != nil {
    return err
  }

  err = xsdHandler.ValidateMem(fileContent, xsdvalidate.ParsErrDefault)
  if err != nil {
    return err
  }
  ...
}

(I also tried creating the XSD handler only once inside main and injecting it into my service but that didn't help either)

When uploading a 90MB XML file and validating it with the XSD the memory usage looks like this: image

After uploading it again the memory usage grows almost twice the size: image

Unfortunately, the memory never frees up.

Now when using the same service again but commenting out the validation part: err = xsdHandler.ValidateMem(fileContent, xsdvalidate.ParsErrDefault) the memory usage looks like this

After the first time uploading the 90MB XML file: image

After the second time uploading the file: image

The file was still read into memory but after the GC kicks in it will be all freed up.

Any idea why the memory usage is so high and why it's only growing and never being freed up?

fabianem avatar Dec 20 '23 23:12 fabianem