backscrub icon indicating copy to clipboard operation
backscrub copied to clipboard

Reading TFlite model metadata

Open phlash opened this issue 3 years ago • 10 comments

Originally posted by @phlash in https://github.com/floe/backscrub/issues/77#issuecomment-835513517

Sounds like a nice solution, but is the metadata normally included in models? Sounds like models with metadata should essentially be a zip file?

Not in the models we're currently using, but only because we didn't need it. Models with metadata in are available from the Google model zoo (https://tfhub.dev/). [edit] I lied, that only contains Deeplabv3 with metadata. Looks like the Media Pipe team haven't added any yet, although they do have model cards. Oh well. end

I'm currently looking at sane ways to read the metadata without adding a new dependency and build pain (it needs Bazel) for the tf_lite_support library that's officially required. So far, getting a blob of metadata out works ok (see snippet below), since TfLite already supports random blobs in a file, however that blob then needs parsing (from flatbuffers format, schema here: https://github.com/tensorflow/tflite-support/blob/master/tensorflow_lite_support/metadata/metadata_schema.fbs) to pull out the input normalization constants.. Currently poking through this: https://google.github.io/flatbuffers/md__internals.html and a hex dump of the raw buffer.

int init_tensorflow(...) {
	//...
	auto model = flatmodel->GetModel();
	auto *md = model->metadata();
	if (md) {
		for (uint32_t mid=0; mid < md->size(); ++mid) {
			const auto meta = md->Get(mid);
			printf("found: %s\n", meta->name()->c_str());
			if (meta->name()->str() != "TFLITE_METADATA")
				continue;
			// grab raw buffer and dump it..
			const flatbuffers::Vector<uint8_t> *pvec = model->buffers()->Get(meta->buffer())->data();
			printf("metadata dump (size=0x%X)\n", pvec->size());
			parse_metadata(pvec->data(), pvec->size());  // currently just a hex dump :)
		}
	}

This might turn out to be horribly fragile though!

Question: What do we think about using Bazel for builds (standard Tensorflow tooling)?

phlash avatar May 08 '21 22:05 phlash

Question: What do we think about using Bazel for builds (standard Tensorflow tooling)?

On first glance bazel looks like some hipster tooling, written because somebody didn't understand the existing solutions. If we change build systems then cmake is the furthest I'd like to go. The PR we had on that subject wasn't perfect, but still better then introducing some non-common build dependencies (never heard of bazel before TBH).

BenBE avatar May 09 '21 01:05 BenBE

[offtopic] +1 for "hipster tooling" ;-) [/offtopic]

floe avatar May 10 '21 07:05 floe

Got a prototype metadata reader working in my tree: https://github.com/phlash/backscrub/tree/tflite-metadata

Not pretty but avoids pulling in a whole new library and 'hipster' build system :wink:

phlash avatar May 10 '21 09:05 phlash

Maybe also a bit off topic in regards to meta data reading; Tried to use that hipster build system for a standalone lib that uses TensorFlow. Turned out to be a huge time sink..

Goal: just make some quick experiments with post processing of model output in Python

  • Problem: Not possible to set custom ops in current (released) tflite Python [1]
  • Solution: Try to create a small wrapper lib to do inference and use it from Python with ctypes
  • Reality: Spend way too much time fighting with Tensorflow building and linking errors

Got it to build for my use case in the end. Tried to make a quick adaption of Bazel for backscrub but never managed to find a good way to integrate opencv dep.

Some take home messages:

  • Building TensorFlow as Bazel external dependency does not seem officially supported, but is doable [2], [3]
  • C API seems to be recommended over C++ by tf devs [4]
  • TF Devs indicate CMake is no longer supported? (old comment) [5]
  • Bazel has nothing like pkg-config or .cmake files. Externals deps that are not built by Bazel can be added by listing paths and globs on what headers and libs to look for

1: https://github.com/tensorflow/tensorflow/issues/44043 2: https://github.com/tensorflow/tensorflow/issues/12761 3: https://stackoverflow.com/questions/48497006/how-to-add-tensorflow-to-existing-bazel-project-as-external-dependencies 4 : https://github.com/tensorflow/tensorflow/issues/35689#issuecomment-642903301 5: https://github.com/tensorflow/tensorflow/issues/30183#issuecomment-506279570

vekkt0r avatar May 13 '21 12:05 vekkt0r

[still OT slightly] Thanks for trying the 'hipster way' :smile:. I'm not sure CMake is going away now (it may have been then), as it's properly documented and marked as 'experimental since 2.4' here: https://www.tensorflow.org/lite/guide/build_cmake. I for one would rather use CMake for it's good documentation, popularity and capability (even if I don't like the mess it spews out!). It took me just a few minutes to get a basic 'backscrub + tensorflow' combined build working, so I could enable XNNPACK and double the CPU-based performance: https://github.com/phlash/backscrub/tree/xnnpack-test

[back on topic] Thoughts on my rough metadata extraction hack? It could/should probably use [de]serialisation code generated by flatc rather than hacked up by hand (but then we have to build & run flatc - which has CMake and Bazel support). It should read the 'associated files' metadata rather than assume 'labelmap.txt' exists. Is there other metadata we would want? It might be better to rebase on experimental rather than my tflite-extract branch...

phlash avatar May 13 '21 15:05 phlash

Does the flatc stuff always need to full spec to be built? Could we try to build a reduced version of it and freeze it for our purposes? Haven't looked at the generated flatc source code (and I'm too afraid I'd rather not do that to reduce the amount of nightmares), but how sane is it? Would interfacing just that reduced part work? Or is the data structure needed sane enough to whip up our own reduced parser for it?

BenBE avatar May 13 '21 18:05 BenBE

Google do exactly that for parts of the build already, pre-built headers from flatc (resulting in schema_generated.h), to avoid the pain of building and running it, but only for the parts that they support in Make and CMake builds, not the metadata library (nor it turns out the GPU delegate). I'll have a go at building/running flatc...

phlash avatar May 13 '21 21:05 phlash

There seems to be a package flatbuffers-compiler (at least on Ubuntu 20.04 LTS and Debian Bullseye) containing flatc, thus this might come with an acceptable level of PITA … It could even be remarked upon as an optional build dependency then …

BenBE avatar May 14 '21 07:05 BenBE

Ah-ha! Also in buster-backports so this might be an easy way out. That said flatc builds easily enough with CMake, unfortunately it's a struggle to do that via the main tensorflow-lite build (the dependency mechanism using FetchContent is fiddly and doesn't pass options through). I'll install the packaged version and compile up the metadata schema, see how ugly it looks..

phlash avatar May 14 '21 08:05 phlash

OK, it looks neater with the compiled metadata serializer: https://github.com/phlash/backscrub/tree/tflite-metadata

This branch is now:

  • based on main
  • uses flatbuffers-compiler package (stock Ubuntu or backport for Debian buster)
  • reads the output labels file name from metadata

Awaiting somewhere to use the metadata values... coming in: https://github.com/floe/backscrub/pull/77 then I'll PR this work

phlash avatar May 14 '21 11:05 phlash