make-sense it is simply very difficult to save the progress while annoting a large amount of data

it is simply very difficult to save the progress while annoting a large amount of data

Open vishwajeet1361 opened this issue 2 years ago • 10 comments

👋 Hello @vishwajeet1361, thank you for your interest in make-sense - free to use online tool for labelling photos! 🏷️

🐞 Bug reports

If you noticed that make-sense is not working properly, please provide us with as much information as possible. To make your life easier, we have prepared a bug report template containing all the relevant details. We know, we ask for a lot... However, please believe that knowing all that extra information - like the type of browser you use or the version of node you have installed - really helps us to solve your problems faster and more efficiently. 😉

💬 Get in touch

If you've been trying to contact us but for some reason we haven't responded to your issue yet, don't hesitate to get back to us on Gitter or Twitter.

💻 Local setup

# clone repository
git clone https://github.com/SkalskiP/make-sense.git

# navigate to main dir
cd make-sense

# install dependencies
npm install

# serve with hot reload at localhost:3000
npm start

To ensure proper functionality of the application locally, an npm 6.x.x and node.js v12.x.x versions are required. More information about this problem is available in the #16 issue.

Aug 14 '22 02:08 github-actions[bot]

Hi, @vishwajeet1361 👋! This is definitely on our radar.

When you say a large amount of data, how many images are we talking about?

Also, it would be great if you would upvote Allow for saving and loading your project from hard drive in our poll for the next BIG feature.

Aug 14 '22 07:08 SkalskiP

Also, it would be great to learn a little bit more about the annotation task that you are trying to perform. So all in all how many images are you trying to annotate, what is the typical resolution of those images and most importantly what label type are you using - boxes, polygons?

Aug 14 '22 08:08 SkalskiP

Also, it would be great to learn a little bit more about the annotation task that you are trying to perform. So all in all how many images are you trying to annotate, what is the typical resolution of those images and most importantly what label type are you using - boxes, polygons?

I have loaded ~44,000 images and imported a COCO file with polygons.

My goal was to adjust the polygons in every image if they were incorrect (because they were auto-generated from a pre-trained model), and assign correct species labels.

While all the images and labels load perfectly fine, there is not option to save progress. When i periodically export the labels it only includes labels for the images i've opened/edited. All the original polygons in the remaining images are not exported in the export file.

Because it is not practical to label ~44,000 images in one go i'm left with multiple export files i need to separately track for which images have updated labels in them and use other scripts to eventually merge. I can usually last 1 week before the browser crashes and i need to start a new labelling project and import all the images/labels again.

It would be great if this could be improved.

Oct 21 '22 10:10 ajansenn

Hi, @ajansenn! 👋 Thank you very much for your interest in makesense.ai, as well as for sharing details of your workflow. Let me try to answer your questions and concerns.

I must admit that makesense.ai originally was not intended to solve this kind of problem. I didn't expect, that a potential user would try to upload tens of thousands of images simultaneously into the editor. And I confess that the advice that I usually gave users in similar situations is to work out the system where you write a python script to divide the dataset into batches, annotate batches in makesense.ai, and merge batches back into a single dataset using another python script. I'm aware it is not a perfect setup, and it requires a lot of manual media management, but it still to a large extent protects you from the accidental data loss that you described.

That being said, over time I started to notice the need to offer some way of saving the project's progress. Obviously, the typical way to solve similar problems is to build a backend, that would connect to the makesense.ai, and manage your images and annotations. However, I try to be realistic here and don't promise stuff that I most likely won't deliver. Time and cost are the main reason for that. I'm thinking however on some middle ground. Here are the things that are coming to the app in the near future. please let me know what you think, and if you have other alternative ideas.

Integration with Roboflow Universe. That will for sure not be the solution for everyone. It won't cost you anything (I guess 😅 - @yeldarby), but it will make your images and labels accessible to other users of Universe. Simply make our data public.
Allow saving your progress directly to your hard drive. Your data will still be private, and the labeling free - as it'll never leave your computer, but it may heavily slow down the whole process.
Integration with self-hosted AWS bucket. In this scenario, you'll be forced to host your own AWS infrastructure, and most likely pay some monthly bill.

I'm super curious about your opinion. 🧐🙏

Oct 21 '22 16:10 SkalskiP

Thanks for your reply! I appreciate what I’m trying to do is pushing the limitations of the tool. Saving to local storage would be a great solution, or connecting to a cloud container (Azure or AWS) would be good also.

Having that functionality with a web based annotation tool like makesense with no heavy setup or install would be a big boost to labeling productivity!

Nov 11 '22 10:11 ajansenn

No worries @ajansenn 😎, I understand that. Like I said makesense.ai was originally written as simple front-end only annotation tool. With time it outgrew its original purpose, and now we need to invest time and effort into building new complicated functionalities.

Just right now we are adding the ability to use external inference server and we are even building our inference docker.

I'm sure that with time we will also add some backend to store our annotations ;) I just need to find people who are willing to help me out with it.

Nov 12 '22 00:11 SkalskiP

@SkalskiP I am willing to help out with all those things, especially the server-side stuff. Storing LabelsState in localstorage may be low hanging fruit.

Dec 13 '22 22:12 bherbruck

Hi, @bherbruck 👋🏻! Would you be willing to create some PoC of that solution?

Dec 14 '22 01:12 SkalskiP

this commit is a very quick-and-dirty way of doing it with middleware and a loader action called when the app loads.

The problem is with the image uuids, since they are unique each time, we would have to either:

check if the file name is the same (boo 👎)
hash the image and use that as the id

Dec 14 '22 03:12 bherbruck

make-sense make-sense copied to clipboard

it is simply very difficult to save the progress while annoting a large amount of data

🐞 Bug reports

💬 Get in touch

💻 Local setup

make-sense
make-sense copied to clipboard