v2e icon indicating copy to clipboard operation
v2e copied to clipboard

To get some clarifications about using V2e events ..

Open gwgknudayanga opened this issue 1 year ago • 4 comments

Hi,

we are planning to use v2e to generate events for static images and videos(collected as 6Hz frames by a drone). Herewith i am requesting your support to understand following things related to v2e.

  1. Normally event camera has a temporal resolution of 1us (microsecond). Can we have 1us temporal resolution for a video of 1 second period with v2e event synthesizer?

  2. If the timestamp step is 1ms, then that means we can have 1000 time bins/steps for a one second video? if it is we can maximum of 1000 events for one pixel for a video of 1s.Isn't it?

  3. In the v2e paper, it is mentioned that 'The upsampled frames corresponds to timestamps'. So, if then the time stamp step is 1ms, then it should have 1000 time stamps for 1 second video, isn't it? If it is, That means it should contain 1000 upsampled frames. But according to the example given in the paper (60Hz video with DVS timestamp step 1ms) the number of upsampled frames is 17 (not 1000 as i thought it should be). So could you please explain where my understanding went wrong?

  4. is it still possible to generate events when we have only one image (still image) by simulating some kind of virtual camera movement with v2e? Could you please share the code which how still images are converted to a small video using the saccade motion so that it can be fed for event generation?

  5. Is there anyway that we can refer the code that you used to generate events for N-Caltech 101, MVSEC data sets using v2e? Because, with that we can comfortable understand how to access the v2e event database, how to compose voxel grids by organizing the v2e events accordingly and also how to synthesize events for static image with v2e. Thank you so much.

Rgds, Udayanga

gwgknudayanga avatar Aug 24 '22 15:08 gwgknudayanga

  1. You could in principle generate 1us timestamp resolution v2e output, but it would be terribly slow because you would be upsampling to 1MHz frame rate.... would take forever. More realistic is either 100us or 1ms for real input. The timestamp jitter under most lighting conditions is on the order of 100us to 1ms (see the plot in the original DVS128 paper at Fig. 10 of Lichtsteiner, Patrick, Christoph Posch, and Tobi Delbruck. 2008. “A 128×128 120 dB 15 μs Latency Asynchronous Temporal Contrast Vision Sensor.” IEEE Journal of Solid-State Circuits 43 (2): 566–76. https://doi.org/10.1109/jssc.2007.914337.)
  2. No, you can get a larger number of events than 1 event per timestamp resolution if the contrast change is large enough. But note that the refractory period will also limit the number of events and 1kHz event rate per pixel is unrealisitically high. Typically we can get at MOST a few hundreds events per second from individual pixels when small parts of the sensor are stimulated.
  3. the respeated events will have timestamps linearly interpolated between the the upsampling times. See the code for how this works. v2e computes the max number of events from any one pixel in each frame, then assigns the interframe timestamp interval to be that number of timestamps for that frame. I.e. if the upsampled timestamp interval is T and there are N max events from some pixel, then the timestamps will be T/N for that frame. But note there may also be some interaction with the refractory period; see the code for how that works.
  4. You could write such saccade generator pretty easily using synthetic input module, but we don't have one now. If I were to do it, I would just copy one of the synthetic input classes and then modify it to take an image as input argument, and some other possible parameters, then transform this image using OpenCV methods and supply these frames to v2e along with their times. I could try to write this but don't have time now. I suggest starting from moving_dot.py since it takes arguments already https://github.com/SensorsINI/v2e/blob/master/scripts/moving_dot.py
  5. We can look for MVSEC and Caltech101 code but Yuhuang would be the guy for this.

hope this helps.

tobidelbruck avatar Aug 26 '22 07:08 tobidelbruck

A saccade generator synthetic input class could be useful for reproducing datasets like N-Caltech101. I'll take a look next time working on v2e. In the meantime, I think someone else could write such class and suggest a push with it.

tobidelbruck avatar Sep 05 '22 06:09 tobidelbruck

Thank you so much for all the valuable information and guidelines!!

gwgknudayanga avatar Sep 05 '22 09:09 gwgknudayanga

Hi, Further, could you please help me to understand the following concerns regarding saccade camera trajectory? -In the v2e paper it is mentioned that "Each image in the Caltech 101 dataset was recorded by a DVS for 300 ms using three 100 ms-long tri-angular saccade". -As i expect to generate events for static images (which is related to structural damages), since it is required a set of frames for the event generation, I checked out the code for ESIM ( (https://github.com/uzh-rpg/rpg_esim) and was able to get some frames for a static image with random camera motion. But according to the paper i also want to use the saccade motion for the camera. In the original paper also it is said that "three micro-saccades tracing out an isosceles triangle was used on each (static) image" to generate set of from a static image. If i want to do this same thing in ESIM , Normally, how should i provide the corresponding trajectory for the camera for saccades tracing out an isosceles triangle? In the original paper three micro saccades are mentioned as in the table in shown in the following link. Here it seems there are no translations of the camera and only rotations. starting and end orientations of the camera is related to a one micro saccades is for example (-0.5 deg,0.5 deg,0) to (0.5,0.5,0) degree). https://www.frontiersin.org/files/Articles/159859/fnins-09-00437-HTML/image_m/fnins-09-00437-t001.jpg

  • I have provided the following trajectory to ESIM, where first entry is the time, second three entries are the position of the camera and third four entries are the quarternion which related to the orientation. Is that correct? however in this way i cannot provide the speed for each micro saccade. Is the speed can also be fed? If it is,could you please tell me how should i do that?

#time x y z qx qy qz qw 0 0 0 0 -0.0044 0.0044 0.0000 1 100000000 0 0 0 -0.0044 0 0 1 200000000 0 0 0 0.0044 0.0044 -0.0000 1.0000 300000000 0 0 0 0.0044 -0.0044 0.0000 1

Thank you !!

gwgknudayanga avatar Sep 05 '22 09:09 gwgknudayanga

I'm sorry I can't help with this issue, hope you worked it out.

tobidelbruck avatar Jul 26 '23 16:07 tobidelbruck