imagemonkey-core icon indicating copy to clipboard operation
imagemonkey-core copied to clipboard

Pure component and material labels - suggestion

Open dobkeratops opened this issue 5 years ago • 61 comments

Imagine if it allowed part and material names (wheel head metal wood etc) as standalone labels - without requiring an object :

  • if you can browse free labelling you will find some examples I’ve done like this already , to see what it looks like (search for wheel , head , hand ..)

  • it decreases the amount of typing to make a label, or list navigation to select a label ( eg currently I think the full list must show every combination of component and object). in the tablet+pen use case (drawing outlines is much faster but typing is a bit slower) this has been extra comfortable. Even when using a keyboard, writing out “wheel of car, headlight of car, car, windscreen of car...” means a lot of repetition of typing car

  • it would allow annotating the parts of objects we don’t yet have labels for, eg obscure types of vehicle or device (I’ve seen a cleaning bucket on wheels, “Tricycles” share components with “bicycles”, or there are many types of animal yet to include).

  • I would still suggest allowing the specific combinations aswell

  • regarding training , there are would be parallels between the component of many distinct objects - eg similarity between “wing of aircraft” and “wing of bird”, “fin of fish”, “fin of rocket” ; whilst I would still say “specific object components” are more important , I think you could still train a detector for standalone part labels

  • sometimes those comments are visible in isolation, either detached , or with the rest of the object invisible

  • regarding materials, some objects are comprised of multiple materials , eg a tool with a metal blade and wooden handle , a box with a metal frame and wooden panels, a plastic bucket with a metal wire handle ; buildings with stone or wooden walls, and tiles or thatched roofs , etc

  • again with materials, sometimes you can see components , bits of debris etc without knowing what they are

Layering could be used to retroactively add the “object name” information , or possibly (harder) a hierarchy feature (like label me) - eg imagine if you could select several polygons and combine them to make the patent shape . I’ll say when annotating with a pen , doing an extra polygon for the whole object is usually fast enough already . I would suggest either (i)trying this at the pixel level (ie “man,woman,car, truck” are distinct image, channels from “wheel, hand, head, headlight” etc - so they all just get painted in allowing overal - and the existing specific part labels just activate both object and part channels) or (ii) assigning an object to the part based on greatest polygon overlap (harder, and you’d probably have to code the pixel overlay first anyway to figure this out). Objects might overlap already so I suspect we’d have to use a many channel image anyway .. there would be various ways to handle this reasonably efficiently) 6ADB8B0A-CBBD-4D56-B797-1A245A59B1E3

Some suggestions for part labels: wheel,tire,hub, axel,lid,handle (many hand held objects , also doors),hole, saddle, seat, roof head,tail,neck,arm,leg, foreleg , hindleg (some animals .. “foreleg” is transition between quadruped and biped), wing, fin, flipper (divers and some animals), tailfin ( best name for the rear fins of an aircraft vs calling it tail - tailfin of fish , tailfin of aeroplane ..)

Some labels could be a part or genuinely standalone object: wall

dobkeratops avatar Jan 22 '20 10:01 dobkeratops

there is of course problem with some words being ambiguous between part or object or material: fork, glass are two examples that spring to mind Maybe combinations or aliases could disambiguate? Making specific aliases would be safest. (There might be ways to guess based on overlap and context , but that would be more complex to code and would need testing )

fork: “table fork” - cutlery “pitch fork” - tool “fork of road” might be useful for road layouts , imagine a conversation between Intelligent taxi and passenger .. “turn left at the next fork in the road..” “fork of bicycle” Or “bicycle front fork” “rigid bicycle fork” “suspension fork”.. A bicycle component holding the front wheel

glass: .. possible hints?

glass material ? glass cup glassware (Broad label covering Jars jugs etc just made of glass) glass of water (must be a cup) empty glass (must be a cup) glass panel glass tabletop glass window glass building glass shards

dobkeratops avatar Jan 22 '20 11:01 dobkeratops

interesting ideas - thanks for bringing those up!

I think we could indeed promote component labels to "normal" labels.I have to check the backend code, but I think there are no restrictions that would prevent that. I'll have a look at that later this week.

The only downside I can think of is, that we might end up with a few duplicates. e.g someone adds a wheel/car label and the next one a wheel label, both annotating the same object. But I guess that's something we can only solve with moderation, no? (I think it's a bit similar to content moderation on Wikipedia. While there are tools in place that try to detect spam, abusive behavior and duplicates there's still a significant amount of manual moderation work needed for house cleaning.)

I also like your layering/grouping idea. Together with the browse based mode, I think we could later easily add the possibility to rearrange component labels bulkwise. e.g: imagine a browse based view where you can search for component labels (e.g: wheel) and it returns you all the existing polygons for that label and the images they belong to. By scrolling through the images you can e.g select all the wheel polygons that belong to a car and click: "add to parent label ". This transitions the wheel annotations to wheel/car annotations.

bbernhard avatar Jan 22 '20 18:01 bbernhard

Right that might take some fine tuning regarding label sorting and visibility while annotating .. maybe there could be some assist regarding how the current label overlaps ( eg if you selected “wheel” perhaps it could show all other wheel related labels (wheel of car etc) 50% transparency as a guide. The same sort of thing might help with person,man, woman

dobkeratops avatar Jan 22 '20 18:01 dobkeratops

Just curious : i saw some more uploads up on the activity chart (90k now.. looks like 100k images is within reach..) Is that you , or another contributor? Are they more scrapes or original photos? ( I realise the usual sources would have kept getting updates)

I think I’m seeing some of them in unlabelled image searches (ie images i don’t think I can remember uploading or seeing in my searches), which is great .

It might be nice to have a search option for “recent uploads” (eg here I’m curious to browse these new images) , .. but the best default order is open to question . Pure random stops it lingering on one class of image , but sometimes if you personally upload something you’d want to annotate it whilst it’s fresh in your mind

As it stands the default seems to work ok (ie I do see a mix of old and new)

dobkeratops avatar Jan 23 '20 14:01 dobkeratops

Just curious : i saw some more uploads up on the activity chart (90k now.. looks like 100k images is within reach..)

yeah, that was mostly me. The last few days I was scraping flickr for CC0 licensed images. It's a big mixture of stock photos, panorama photos, still photos, etc. Due to the inbuilt duplicate image detection, only new images should have made it through the detector (when I bunch upload scraped images from flickr a significant portion of the images are rejected as a duplicate, so I think the duplicate detection should be working fine :))

It might be nice to have a search option for “recent uploads” (eg here I’m curious to browse these new images)

good idea, I'll put that on my todo list :). Thanks!

My new year's resolution regarding ImageMonkey is to be more active in relevant communities in order to attract more contributors. We've made a lot of progress over the last two years and I think we've now reached a point featurewise where it makes sense to reach out to the community again for some help.

I recently also added the possibility to support the project financially (either via Paypal or via Patreon). One big (far fetched) goal is to experiment a bit with micro payments. e.g users who like the service, but lack the time to contribute to the dataset, can support the project financially. The collected money is then used to reward (power-)users via micro payments. But yeah, I guess the whole ML/data collecting sector is probably only interesting for a small niche...so not sure if we will ever get more than maybe a handful of backers.

bbernhard avatar Jan 23 '20 18:01 bbernhard

Ah yes I noticed that (the option to donate). It could be kind of like open source bounties. We just need to get the message out there and find ways to connect with other projects and services. There are so many potential uses for a truly open source image database. I contribute because I can imagine getting some of these done myself in the future, and I see the big picture ( if we want “democratised” robots, we need democratised training data) but people who aren’t developers probably want closer tangible benefits. “Interesting to a small niche” - a small niche would see the use and contribute, but ML and hence training data is relevant to almost every activity. Food, transport,entertainment,medicine..

dobkeratops avatar Jan 23 '20 19:01 dobkeratops

I finally motivated myself to write another blog post: https://imagemonkey.io/blog/general/2020/02/09/ImageMonkey-100k-0.html (I am not a good writer, so I hope it's not too bad).

As already mentioned in the article, thanks a lot for your help @dobkeratops.

bbernhard avatar Feb 14 '20 16:02 bbernhard

nice summary. I just started getting back into a graphics project. I'm keeping in mind the long term goal: AI assists for 3d graphics pipelines.. but there's a big gap between any current work. Many steps remain. I'm hoping that imagemonkey will be a suitable resource .

I'm sure you've seen those impressive nvidia demos (painting with named textures, even rendering environments).. there is clearly scope for more

I do have a reasonale spare GPU lying around that I could leave training (gtx1080) - I've never actually got much done with AI down to the training times but I note you talk about the shortcut of using a pre-trained nets.

I also note reading your summary that you have training integrated with the site, that seems rather useful..

I think there are some tricks like over-estimating whats needed then cutting net subsets out afterward.

I've got a few other ideas midway between graphics programming and datalabelling.. like labelling with a library of aproximate peices of geometry. But I'll pause before dumping all that, and get some examples together

dobkeratops avatar Feb 14 '20 21:02 dobkeratops

thanks!

That sounds really interesting - can't wait to see some example :)

Yeah, I've been playing a bit more with AI + neural nets recently. Personally I find neural nets really interesting, but for me it's also one of those topics that burns me out pretty quickly (as it's quite challenging and sometimes pretty frustrating). So, I've a bit of an love/hate relationship with machine learning :D

The gtx1080 is pretty cool - I am using the same model on my rented bare metal server.

For me the most frustrating part with machine learning has always been the instability of the machine learning frameworks. More than once my code broke because Google changed something in Tensorflow. That was mainly the reason why I've created the ML ImageMonkey docker container for training. With the container it's possible to spin up the exact same training environment without worrying that some updated package breaks everything. That makes the whole machine learning stuff much more fun.

bbernhard avatar Feb 16 '20 19:02 bbernhard

its just an engine for my own amusement. not going to compete with unreal/unity. Got it flying around a landscape at the moment. I've bounced in recent years between Rust and C++ - this year i've gone back to c++ where I can get things done quicker.. I might end up trying to sync parts of it - coming back from rust my C++ style changes. Sadly whilst I like all the ideas in rust, the better IDEs and familiarity still make me 4x as productive in C++* (* if its a personal project where I'm using my own prefered subset.. everyone has their own)

At the same time i'm trying to improvise some "programmer graphics" in blender,gimp, as I dont currently have a professional artist to work with. Thats where the interest in neural nets comes in. The sky is the limit but it's also hard to get something useful done (with the long training times).

I've got my spare PC back up - I could put my bigger GPU in that and leave it training in the backgropund

"big goals"..

the simplest application of nets would be guessing material assignments (things like "rusty metal", "brushed metal", "plastic", "stone","brick", "gravel",..) .. slapping microtextures + appropriate specular/reflection etc onto things. I've seen some people generate normal maps,displacement maps.

..but the more difficult challenge is the assist of building textured art, which could come from 2 angles:-

  • (i) assists for texturing simple lowpoly models .. could be enhanced by recognising what the things approximately look like (brings us back to the question of recognising things like "(toy) car", "(cartoon drawing of) fish",..) - (ii) or the exact opposite way round: assists for figuring out the geometry behind a photo.

I wonder if we could mix low-poly (and cartoon/toy representations) art with labelling?.. "this lowpoly/cartoon/toy is supposed to represent this photograph". almost like a form of visual label. "the closest lowpoly model to this photographed object is ..."

A bit of a moonshot however ... with a few 100k's-millions of examples, plus some as-yet unknown variation of GANs (and wavefunction collapse synthesis?) -> an AI art assist that can improve a novices attempts..

one side note , i just had a bash at implementing delauney triangulation (there's probably JS libs at hand for that) and remembered the idea of 'point labels' (imagine an intermediate between just labelling and full annotation where you just give a few example points, then a 1st guess is to just triangulate and interpolate label probability between the keypoints). That could be really intuitive for casual users?

dobkeratops avatar Feb 22 '20 21:02 dobkeratops

84B4E7E9-D6D4-42AB-851A-95F05D5F1DFA

dobkeratops avatar Feb 22 '20 22:02 dobkeratops

Thanks for the heads up - this project of yours sounds really interesting! Can't wait to see the first prototype in action :)

Thats where the interest in neural nets comes in. The sky is the limit but it's also hard to get something useful done (with the long training times).

I am wondering if you can iterate faster, by focusing on a really small dataset first and then ramp up the dataset size, once you decent results with the small dataset.

on a related note: I am currently in the middle of fully automating the training of neural nets. It basically works like this:

  • the necessary data is downloaded from ImageMonkey
  • the neural net will be trained on the data
  • after the training is finished, the model will be automatically uploaded to a public Github repo

The whole thing runs headless without any user interaction on a regular basis. So every x days a new trained model will be uploaded to github. As this runs automatically without any user interaction it's quite hard to tell whether a neural net has improved between training runs or has become worse.

In order to track the health of the model, I've implemented two additional mechanisms (which I find quite useful):

  • after every training run, tensorboard gets started and a screenshot of the models parameters will be created automatically. (e.g: https://github.com/ImageMonkey/imagemonkey-models-test/blob/master/image-classification/2020-02-22%2016:20/graphs.png). I am using this project for creating the tensorboard screenshots.

  • after the model is trained, I am running some test data through and automatically create screenshots from the models output. (e.g: I am feeding the neural net a picture of dog and expect it to also identify it as dog. If the model identifies it with 99,9% probability as dog two weeks ago and now the probability is only 80%, something has gotten worse). Going through the visual output really helps to get a feeling on what type of images the neural net is struggling with.

I am a novice when it comes to machine learning, and I am pretty sure there are better techniques out there, but those two actions really helped me to get a better understanding of the neural net and helped to make the training results somehow comparable.

one side note , i just had a bash at implementing delauney triangulation (there's probably JS libs at hand for that) and remembered the idea of 'point labels' (imagine an intermediate between just labelling and full annotation where you just give a few example points, then a 1st guess is to just triangulate and interpolate label probability between the keypoints). That could be really intuitive for casual users?

Haven't heard about delauney triangulation before, but will definitely look that up. Thanks for sharing! :)

Out of interest: Which ML framework are you using? Haven't done anything ML related in C++, so I am really curious about it :)

bbernhard avatar Feb 23 '20 19:02 bbernhard

The automated training sounds great .. I suppose users could eventually configure a label set to train on through a UI (maybe overkill) . That does sound like an extra dimension to the site.. I’ve heard other talk of “democratising machine learning” in a similar way, UIs for casual users much like we have artist and designer tools that can do some impressive things without writing code. Perhaps the observed errors on the trained nets could be used to suggest tasks (funding difficult examples)

Last time I actually tried anything I implemented convnets myself in OpenCL but I didn’t do any thing useful with them .. I think I would just pickup tensorflow. There is one more itch for something custom - I suspect there is overlap between capsule nets (“routing by agreement”) and the texture synthesis technique called “wavefunction collapse” (selecting potential patterns based on overlap ie agreement.. Ie there might be a technique between GANs and these other ideas waiting to be discovered

dobkeratops avatar Feb 23 '20 19:02 dobkeratops

soft_focus this is my current renderer. heightfield + some random objects. As I mentioned most people say its pointless to write one because you can just get unreal or unity.. they have a huge feature set refined by a team of experts for 10+ years, cheap because there are so many customers. unfortunately its so much fun to write, so i'll go full NIH.

So I'm looking for ways to connect "writing a renderer" with AI. could generate procedural meshes, spit out the intermediate channels (normals,depth correlated with their generated image) to train a net to get a sense of 3d, whatever. (i gather this is called "domain randomization")

I'm a looong way from being able to do any of the things I mentioned (AI texturing assist etc). At the same time i'm trying to learn a bit more 3d modelling (usually I just did code and professional artists built everything)

you probably know with the tensor cores in the latest nvidia machine there's ways to use AI directly in rendering (AI denoisers for raytracing, AI materials.. you could do an expensive calculation offline, then train a net to recover the end result from the input channels)

dobkeratops avatar Feb 25 '20 06:02 dobkeratops

WOW, that looks really cool!

As I mentioned most people say its pointless to write one because you can just get unreal or unity.. they have a huge feature set refined by a team of experts for 10+ years, cheap because there are so many customers. unfortunately its so much fun to write, so i'll go full NIH.

I can definitely relate to that. ;-) I think it's a great idea to connect AI with writing a renderer. That way you to get the best of both worlds. I am already looking forward to the point where AI meets renderer and you put that all together - really excited to see how that all works out. :D Great work!

bbernhard avatar Feb 26 '20 20:02 bbernhard

another screenshot, added water.

the random objects have random textures stuffed on them. i was thinking of using some general 'common' shapes (bits of furniture, architectural elements, car outlines, barrels..) with random textures (brick, stone,gravel..) and light -> -> then train a net to infer (shape_index, orientation, texture_index) - but would it have any synergy with real images? I would at least need the objects to have 2-3 peices of surface sepration (eg car: body, winscreen,wheels); a vision system needs to know when things with different textured parts are connected as one whole

does training to recognise a cartoon fish help recognise a real fish? how much do you get from the outline alone.

in building the cartoon or lowpoly representation we've somehow done some work distilling the essence of the salient features, but there's still a huge amount of complexity in extracting that from the real world. (and a game renderer does handle some of the light and shadowing , but my own renderer doesn't have all the global illumination of the latest engines)

I was wondering if filters applied to real photos (npr, toon shaders etc) could put them at the 'same level' as artificial objects from the POV of a NN.

s latest omething else to experiment with would be 'CSG' boolean operations (cut/union etc) - perhaps there'd be value in getting a net to figure out how objects can be assembled from components

I know some people are using driving game engines to train SDCs - but these usually have very expensive artwork (large teams of 3D artists painstakingly building textured meshes)

I haven't got an RTX card yet. I figured if I get something non-trivia done I might treat myself to an upgrade. the latest raytracing tech opens doors for more realistic lighting which would help trying to get synthesised images more like the real world

dobkeratops avatar Mar 01 '20 04:03 dobkeratops

Looks really cool - thanks for the update!

does training to recognise a cartoon fish help recognise a real fish? how much do you get from the outline alone. I was wondering if filters applied to real photos (npr, toon shaders etc) could put them at the 'same level' as artificial objects from the POV of a NN.

that's indeed a good question. I guess that would be something worth trying out. Especially your idea of using a toon shader sounds very interesting. Would really love to see how/if that works.

Another idea would probably be to collect some images from games, annotate the objects in there and close the gap this way. But I guess it would be a pretty time consuming task to collect a decent amount of game images, so not sure if this approach really scales?

I haven't got an RTX card yet. I figured if I get something non-trivia done I might treat myself to an upgrade. the latest raytracing tech opens doors for more realistic lighting which would help trying to get synthesised images more like the real world

Out of interest: Which one are you interested in buying? (Haven't followed the graphics card market in years, so no clue which model is currently the best one out there for that task.)

bbernhard avatar Mar 01 '20 18:03 bbernhard

But I guess it would be a pretty time consuming task to collect a decent amount of game images, so not sure if this approach really scales?

right.. if manually annotating, I guess we still get more from real photos (despite the potential for closing the gap with a continuum of real to synthetic data). but if we can get at the source for existing games (maybe modifying emulators or something..) perhaps there'd be a way to auto-label (although 3d games are quite complex internally, its probably quite hard to pin down where the distinct meshes are and how they're submitted).

There is of course 3d content available online , some free models and some purchasable.. all the 3d printing websiets. Some meshes are ripped from old games. By the PS2 era (early 2000s.. shocking to me to think thats retrogame territory now!) the 3d models were quite sophisticated.

There's a need for matching photos to 'handbuilt' meshes. Raw photogrametry scans work , but they're very inefficient compared to meshes that have been fine tuned by a human artist who knows how to distribute detail around the salient features. (in turn such a database could be used to make a more efficient scanner)

Out of interest: Which one are you interested in buying? (Haven't followed the graphics card market in years, so no clue which model is currently the best one out there for that task.)

the midrange 2070. it's a little beyond the 1080. of course the best for the task is the 2080ti .. but the midrange cards usually have the best price:performance. Having a daily card and a spare for training is more useful overall..

dobkeratops avatar Mar 01 '20 23:03 dobkeratops

minor news - the new GPU is on the way (EDIT: turned up). So. with the 1080 as a spare and hopefully withni 24 hours the 2070 as main, and a spare old PC beside me capable of at least driving one .. my excuses for NOT doing AI experiments are dwindling :)

Trying to juggle my focus I started looking for starting points.. I can offset my NIH on graphics with code re-use for training....it occured to me you've got this service for automatically training on your data from imagemonkey already..

So lets look at options?

If I make my engine spit out pairs of images/arrays - 'inputs' being random rendered objects (+random textures*, random lights) , 'outputs' being {{object ID, texture ID} /corresponding to labels & material properties in your database/, object orientation, depth maps} - perhaps I can make use of your code to do the training?

  • What would I need regarding setting up an instance. Whats involved in 'deploying' your code.

  • What would be the best way to format that data, i.e. matching what you already expect from the human labelled data. Do you turn the drawn outlines into pixel assignment? is there some existing pipeline for taking boundingboxes/polys, can it by bypassed with a raw image? (EDIT: whilst i've been keen on boundaries on contextual photos, I would still generate centred images from a renderer- a rectangle that really should predominantly be 'car','house' etc. context for the random images is less meaningful.. its artificial. So it could be as simple as {image->label} pairs)

  • is your code in a state where it would be useable like this (I would be nervous about handing someone else my renderer right now, although I am trying to package it up for someone else already. It was actually someone else asking for engine help with an AGI project - another case of 'training AI on games' that spurred me to write this, but his focus is very different)

  • is there a way a bunch of procedural training pairs and/or a net trained on it can contribute to (i) the 'activity chart' - re-assuring other potential users that this project is active and worth joining and (ii) the overall 'level of visual intelligence' stored in the imagemonkey database.

  • would it be useful to validate the synthesiser by getting a human opinion on what it's examples actually look like,within the same label vocabulary as the photos? (throw some synthetic examples into the database, perhaps grouped into a Collection so when you judge one a fraction of that assignment goes to all ?)

  • would this all just be overkill, or is there food for thought in future modifications. What would be the best way to package up a renderer for AI experimenters? you could look at additional features in your pipeline for plugging in procedural data?

  • would there be mileage in training using on the fly synthesis or should we just submit a directory - 100k images could be a 10gb dataset, 1million images starts to look prohibabive for throwing data around, whereas slotting in some code to generate them using openGL is very compact. (But what about security) Generting 1million images is 4hours for a 60fps renderer - probably a less GPU overhead than the actual AI training, but still a factor to consider.

failing that I'm sure I can find something else lying around - but you know my big picture interest here. getting some overlap going between different opensource AI efforts would be awesome. I've always thought this whole idea has big potential with many future uses.

(* my hope is the texture can still drive material recognition, even if its on the wrong object. So: if it can spot "a brick texture, on a truck", "a gravel/soil texture, on a table", .. those random samples are still contributing knowledge about table shapes, truck shapes, and those surfaces. A human can make those distinctions - and it might save it from making too many assumptions from texture alone.

this is also thinking about how to leverage current my minimal art database.. I dont have an army of modellers to work with - but I can at least seperate the surfaces out into major peices- verhicles split into body, windows, wheels. humanoid mannequins split into head,hands,upper/lower body. Those seperate parts could be re-textured independently.

I should also keep referencing out past discussions on the picture/toy issue. Lowpoly, synthesised images are in the same bracket. I remember your initial take was that the raw labels suffice, you're actually happy to see the picture of a fish labelled as a fish - and indeed my hope for this use case is that something useful falls out of that similarity. I think i've seen a few photoshopped and rendered images in your database already, so you might already have an idea for a 'property' to assign them )

dobkeratops avatar Mar 03 '20 07:03 dobkeratops

minor news - the new GPU is on the way. So. with the 1080 as a spare and hopefully withni 24 hours the 2070 as main, and a spare old PC beside me capable of at least driving one .. my excuses for NOT doing AI experiments are dwindling :)

awesome! Can't wait to hear how the 2070 performs, especially compared to the 1080. (my rented GPU instance still has the 1080).

Trying to juggle my focus I started looking for starting points.. I can offset my NIH on graphics with code re-use for training....it occured to me you've got this service for automatically training on your data from imagemonkey already..

  • What would I need regarding setting up an instance. Whats involved in 'deploying' your code.

I've recently written a short blog post about it on Medium: https://medium.com/@imagemonkey/in-this-short-blog-post-i-would-like-to-show-you-how-to-export-data-from-the-imagemonkey-dataset-68ea5cc171a (I am experimenting a bit with new social platforms like Medium and Twitter in order to spread the news about ImageMonkey). The above blog post briefly describes how an image classifier can be trained via transfer learning using ImageMonkey's dataset.

Besides image classification the monkey script is also capable of doing object detection (via tensorflow) and object segementation using MaskRCNN. The different training modes can be selected via the --type parameter. (--type="image-classification", --type="object-detection", --type="object-segmentation").

In case you give it a try and it fails, please let me know (that's most probably a bug). The whole monkey script is written in Python, so it should be (fairly) easy extendable. In case you want to extend it, here's the code As google doesn't care much about backwards compatibility of their software, I had to pin tensorflow and tensorflow-gpu to a specific version in the Dockerfile, otherwise the script was breaking with every new tensorflow release. I haven't given it a try, but maybe the newest tensorflow releases are already mature enough to upgrade (would be great to move from tensorflow 1.x to 2.x).

is there a way a bunch of procedural training pairs and/or a net trained on it can contribute to (i) the 'activity chart' - re-assuring other potential users that this project is active and worth joining and (ii) the overall 'level of visual intelligence' stored in the imagemonkey database.

Something that I've been playing lately a bit is the possibility to periodically train a neural net and then upload the trained model to github - fully automated, without any user input. At the moment I am using this repo here for beta testing. On my GPU instance there's a script running, which starts an image classification training once a day (using the above docker container) and then uploads the model with some statistics afterwards to the github repo. Since 2020-02-27 it seems to be running quite smoothly :)

I am still working on the code, so it's not yet in a state where I comfortable sharing it with others. But once I am done, it should run on any system (the only requirements are an internet connection and a docker daemon).

Maybe we can go a similar direction with the stuff you are working on? So, that you are also uploading the generated artifacts (whether it is a ML model, images, etc) somewhere (github, s3 bucket, some other storage..etc) and then we are aggregating the information in the activity chart/on the landing page. (e.g displaying a link to the latest model of yours on the landing page; or a models activity chart, etc.)

I think at least in a first iteration it's easier that way, as it gives you the flexibility to break stuff (which happens naturally during development), without worrying that ImageMonkey services are affected too. At a later point, we can decide whether we want a more tightly integration (e.g feeding data directly back into the service/database) or if the lose coupling is maybe even an advantage (we avoid single point of failures). What do you think about that?

would this all just be overkill, or is there food for thought in future modifications. What would be the best way to package up a renderer for AI experimenters? you could look at additional features in your pipeline for plugging in procedural data?

Not sure if you want to go that route, but I am personally a fan of docker images. Docker obviously has some flaws and I am not sure if I would use it for something security critical, but I really like that it's self contained and it can run on almost any OS + architecture (arm, x86-64, i386, etc). I also think that it solves the "it works on my machine" problems quite nicely. And it's really user friendly (usually a docker pull ... followed by a docker run ... is everything you need to run any docker container)

(I have to run some errands. I'll get back to the remaining discussion points later today/tomorrow :))

bbernhard avatar Mar 03 '20 17:03 bbernhard

(Some points from the above ideas dump) BAC2797B-9939-4C1D-BF1A-896CD90E0925

dobkeratops avatar Mar 04 '20 05:03 dobkeratops

(* my hope is the texture can still drive material recognition, even if its on the wrong object. So: if it can spot "a brick texture, on a truck", "a gravel/soil texture, on a table", .. those random samples are still contributing knowledge about table shapes, truck shapes, and those surfaces

totally agreed!

I really like your visual brainstorming approach - your drawings always remind me a bit of Mythbusters (the TV series). They always had this blueprint drawings which sketched out the experiment before they started working on it. Man, I really miss the series..

Wow, your renderer offers quite a lot of possibilities - is there something that you find particular interesting/promising?

Feeding back data to the ImageMonkey database shouldn't be a problem - there already exist REST API endpoints for almost every functionality. A developer friendly library is still missing (I've started working on Javascript and Python libraries, but they are far from complete), but it shouldn't be hard to extend the existing ones (or write a small REST API wrapper in a different language; e.g C++) - I can take care of that :)

Depending on which point you want to implement, there are maybe some database changes needed (e.g to mark the data as "autogenerated" in the database) - but I think we can look at that in detail at a later point.

What would be really great is, if your renderer could output the result artifacts locally (e.g command line) or store it somewhere (e.g in a file, github repository, etc..). That way you could iterate very quickly without worrying that you break something in the ImageMonkey backend. And it gives other users the possibility to fork your project and play with it on their local machine. That would be really awesome!

bbernhard avatar Mar 04 '20 16:03 bbernhard

Right I don’t want to flood your database immediately with 1million rendered examples, hah. I was thinking of throwing up batches of about 1000 examples to see how it goes. We need to establish the label , your suggestion there is “autogenerated”. That could work . I would also add the more specific “low poly” for hand built models in the 100-2000 polygon range ( the kind I have a chance of personally building in blender) . We need to make sure people can easily exclude generated models from searches (And that probably should be the default?) Talking to an artist friend they have recommended splashing buying some meshes, specifically:“evermotion” “archmodels” collections but I’d be looking at €500+ for a decent selection, which has me thinking “buy another GPU for training..”. I’m also not sure about the licensing. I’ll think about that trade off. They do look really good. But it has me wondering how many of these mesh selling websites are already writing this exact system, given their database head start ..

dobkeratops avatar Mar 05 '20 10:03 dobkeratops

Right I don’t want to flood your database immediately with 1million rendered examples, hah. I was thinking of throwing up batches of about 1000 examples to see how it goes. We need to establish the label , your suggestion there is “autogenerated”.

yeah, I would like to tag those generated artifacts somehow. Not sure yet what's the best way to do this...I guess that also depends which one of the above points you want to tackle first.

But I am sure we will find a way to feed the data back. :) As soon as you have a working prototype, please let me know. I'll then have a look at how we can feed the data back and prepare the necessary backend changes.

specifically:“evermotion” “archmodels” collections but I’d be looking at €500+ for a decent selection, which has me thinking “buy another GPU for training..”. I’m also not sure about the licensing.

you are right, they look really good. But as you said, the license could indeed be a problem. My gut feeling is, that we would need something less restrictive (at least Creative Commons 3.0 licensed and even better CC0 (public domain) licensed) in order to get wide adoption. I am also a bit worried about the legal consequences when using commercial models. And I guess this would then also mean, that users would have to buy the same collections when they want to play with your renderer, right?

Unfortunately, I do not know the community much (I played around with Cinema 4D and 3Ds max for about a year when I was ~18 years old), but maybe there are some volunteers out there that would contribute some CC0 licensed models for that purpose? My gut feeling is that once there's something to showcase (even if it's based on some really low poly models), that people will come.

In case you need APIs to fetch data from ImageMonkey, please let me know.

bbernhard avatar Mar 05 '20 16:03 bbernhard

( FYI I got a spare PC running with a spare GPU , initially I'm using it for folding@home as a little response to "current events" .

I was just thinking how with these lockdowns many people should have time on their hands for labelling.. and how suddenly the need for automation, delivery bots etc is more pressing.

If I can figure out a case/cable mismatch issue I'm actually going to have 2 GPUs useable this way . I've actually got a gtx970, gtx1080 , and the rtx2070, and an even older core2 quad machine lying around, which does still work, should surely be enough to just throw work at a GPU. The 970 remains enough for daily work. I'll have to think about the power draw before I go too crazy setting up a farm.

core i7 4790 + gtx970, daily driver             4TFlops
core i7 860 + rtx2070                          8-9 TFlops
core2quad Q6600 <2 potential GPU slots>
                  gtx1080   <awaiting PSU adapter> 8TFlops

I could just timeshare a spare machine between folding and training.

I gather its possible to downclock to reduce strain; I dont think consumer parts are rated to be run continuously like servers, hence the premium that you pay for Quadros etc..)

dobkeratops avatar Mar 24 '20 11:03 dobkeratops

Awesome!

I am still experimenting a bit with neural nets..mostly trying to figure out if we can already do something useful with the collected data we have. Unfortunately, this is pretty time consuming (I am usually kicking off a training job every 2-3 days on my GPU instance) and involves a lot of trial & error.

Up to now, I've only played a bit with image classification (which is rather "simple"), but for me object detection/segmentation is way more interesting. My goal here is to get something done that shows that we are "on the right track"...this can be as simple as a cat/dog object detector. A small little project where we can "show off" a little bit (and which hopefully gets us more traction).

If you want to help with this...any help here is really appreciated! :)

bbernhard avatar Mar 24 '20 18:03 bbernhard

definitely useful .. and I agree its hard becuase the experi.mentation takes so long. can't iterate ideas so fast if it takes 2-3 days to try something out .

as it happens the folding seems to be intermittent - i think they've been swamped with volunteers so the servers aren't always able to give "work units" out. I had it running all night but it's been sat idle all afternoon. Timesharing should go ok. I just need to get this one adapter ordered (6-8pin..) and then i'll have 2 spare GPUs aswell (it would at least let me alternate kicking off 2 experiments in parallel)

So many experiments I would like to run..

  • Pavement vs Road (for SDCs)
  • some set of materials, eg. wood vs metal vs plastic .. gravel grass soil sand
  • recognising human state (lets see how many examples of sitting vs standing vs running vs walking etc we have)
  • even just training to predict the whole label list, without annotations (eg telling the difference between parks /forests , room types etc)

and sure "cat vs dog" is the classic one but it might be nice to show we're collecting potentially new data. some of the ideas above will require writing some code to parse the labels (and probably writing a bunch of aliases - that shouldn't take too long if focussed on one goal , although doing them all probably would..)

dobkeratops avatar Mar 24 '20 18:03 dobkeratops

as it happens the folding seems to be intermittent - i think they've been swamped with volunteers so the servers aren't always able to give "work units" out. I had it running all night but it's been sat idle all afternoon. Timesharing should go ok. I just need to get this one adapter ordered (6-8pin..) and then i'll have 2 spare GPUs aswell (it would at least let me alternate kicking off 2 experiments in parallel)

Yeah, I think there are currently many people out there contributing their computing power to folding@home. While I find the project awesome, I am not sure if this project can really help us with the current crisis in a timely manner. But nevertheless, it's an awesome project and it's definitely worth contributing to (and maybe we are lucky and it really helps in fighting corona) :)

and sure "cat vs dog" is the classic one but it might be nice to show we're collecting potentially new data.

totally agreed. :)

your list of experiments look really interesting - can't wait to see some progress on those. As we have quite a bit of data collected now, I hope that we already can get something useful out of it :)

bbernhard avatar Mar 26 '20 18:03 bbernhard

Just grabbing the current label suggestions endpoint .. it’s going to take some effort to organise this . Doing it all in one go is probably too daunting . Maybe I can write something to parse some of my more recent conventions, and perhaps manually flag some of the typos in there while I’m at it. Could probably approach that from both ends.. start with everything in a grey list, and gradually write a “white list” of confirmed suggestions and a “black list” of typing errors with their best replacement.. I’ll see how far I get..

dobkeratops avatar Mar 26 '20 19:03 dobkeratops

these are just a few greps through the label list https://github.com/dobkeratops/imagemonkeylabelextraction .. trying to glance through to check the theory "anything containing man/woman/car/person is reducable to that", i'm looking for counterexamples.. there's also the example uses of "of", "or"

grep for car.. https://github.com/dobkeratops/imagemonkeylabelextraction/blob/master/car.txt Some labels containing "car" that are not a car:-

  • cable car
  • car jack - type of tool?
  • car showroom
  • car carrying truck - a type of truck with a trailer for carrying cars
  • multistorey car park
  • open air car park
    • perhaps "open_air/" and "multistorey/" could be properties? (multistorey/building.. most are, a few aren't) Having said that, "multistorey car park" is a type of building, whilst "open air car park" is really a type of surface, or area. not sure what you should do there..
  • passenger car/train
    • this is unfortunate :( "passenger car" could mean either a road car or part of a train. i'd want to swap this label for a more explicit alias i.e. "train passenger car" or "railway passenger car", and add an alias "road passenger car" , and flag "passenger car" as too ambiguous (swap it for the explicit aliases)
  • car key
    • really a "key", not a car .. Given the ambiguity of 'key' , maybe we could have explicit aliases "door key","car keys","lock key","keyboard key","computer keyboard key","piano key","synthesiser keyboard key" , and flag "key" as an ambiguous word (use an explicit alias.)
  • car park ticket machine - probably wants to be a big alias, not a "part of car park". but we certainly want a "ticket machine" aswell ("machine"->"ticket machine"->"car park ticket machine") *shadow of car - there's a few of these.. shadow_of_man, shadow_of_truck,shadow_of_fighter_jet. general pupose label for "shadow", or prefix "shadow_of/" perhaps?. it's definitely a shadow.

Variations that are definitely a car:-

  • vintage car - older than 'classic'
  • classic car - not quite as old as 'vintage'
  • sports car,sportscar
  • luxury car
  • prestige car - might be alias for 'luxury car'
  • derelict car
  • parked car
  • supercar (extreme type of sports car)
  • convertible sports car
  • racing car
  • open wheel racing car
  • f1 car,formula 1 car
  • lexus car
  • porsche sports car
  • hatchback car
  • fastback car - any with a sloping rear, constrasts with hatchbackk,saloon
  • coupe car - 2-door, often also 'fastback'
  • saloon car,sedan car - (british and american variations of the same thing)
  • rally car
  • police car
  • estate car (word 'estate' also relates to property so explicit aliases will help)
  • crashed car - variation of derelict.. not all derelict cars were crashed, but "derelict or crashed" could usually be grouped.
  • crumpled car - might be from a scrapyard? "scrap metal" would be a good label
  • luxury sports car (not all sports cars are luxury cars)
  • parked hatchback car (parked could be a property-prefix for all car variants?)

labels containing car that are interior car parts

  • car radio
  • car dashboard
  • car seat *car engine *car chasis

typos..

  • credit car - should be credit card :)
  • eel/car
  • ferris wheel passenger car - must have missed a comma
  • sporys car
  • windscreen car - missed "/" or "of" in the middle. "car windscreen" makes more sense
  • packed car - parked car *passnger car interior
  • novelty car
  • custom car
  • vintage racing car *property vintage, also car->racing_car
  • vintage landspeed record car unusual variation at motorshow, vintage/car->racing_car->landspeed_record_car *windceren/car

*noteworthy/unusual combinations

  • butterfly door/car - specific type of exotic car door. not a "butterfly" :) https://en.wikipedia.org/wiki/Butterfly_doors
  • mini (car) - given that "mini" is a common prefix (mini excavator), and a name of a certain famous car.. an explicit alias is probably useful
  • car park entrance *"entrance" could be a common part? entrance/car_park, entrance/building although sometimes entrance is an alias of doorway.. for a car park it isn't.
  • muscle car - not a "muscle", rather an americanism
  • toy car - definitely a "car shape" , property - toy? .
  • remote control car - type of toy
  • driverless racing car - maybe a property for "driverless" or "autonomous" ?

combinations with words that are always properties (as far as i know)

  • luxury car - (luxury/car? luxury/watch? luxury/hotel?)
  • derelict car - (derelict/car, derelict/building, derelict/aeroplane, etc)
  • empty car park - (property empty, but occurs as disambiguator in alias suggestion "empty glass")
  • burning car (also burning building, property "burning")

dobkeratops avatar Mar 26 '20 22:03 dobkeratops