asammdf any efficient way to load big samples?

I have big .mf4 file(more than 1GB) which includes raw data for 4k size video frames. I am using MDF.get method to extract raw data.

from asammdf import MDF
raw_samples=MDF("video_data.mf4").get("VideoRawdata_VC0",samples_only=True)

But this takes very long time. To avoid unnecessary computing time, I want to load .mf4 file as binary byte array first, then extract data from start address of raw data. Is it possible to get this offset address(start address of raw data) using asammdf? Also, please advice me if you know another efficient way to extract raw data.

thank you.

Oct 15 '22 05:10 c13proto

Things are not as easy as pointing to the start of the data section. You need to take care of the signal position and size in the record.

maybe this can help

from asammdf import MDF
from asammdf.blocks.cutils import get_channel_raw_bytes
mdf = MDF("video_data.mf4")

group_index, channel_index = mdf.whereis("VideoRawdata_VC0")[0]

group = mdf.groups[group_index]
info = group.record[channel_index]
dtype_, byte_size, byte_offset, bit_offset = info

record_size = group.channel_group.samples_byte_nr
invalidation_bytes_nr = group.channel_group.invalidation_bytes_nr

channel_parts = []

for data_bytes, *_ in mdf._load_data(group):

    VideoRawdata_VC0_part = get_channel_raw_bytes(
        data_bytes,
        record_size + invalidation_bytes_nr,
        byte_offset,
        byte_size,
    )

    channel_parts.append(VideoRawdata_VC0_part)

VideoRawdata_VC0_raw_data = b''.join(channel_parts)

Oct 15 '22 10:10 danielhrisca

Thank you so much .I have not full understood your code yet so I am not sure what is VideoRawdata_VC0_raw_data inside. But I think VideoRawdata_VC0_raw_data is too short(1208 length). I print some variables which are used in proposal code.

print(group,info,record_size,invalidation_bytes_nr,len(VideoRawdata_VC0_raw_data))

<asammdf.blocks.utils.Group object at 0x7f0ff517c460> (dtype('uint64'), 8, 12, 0) 20 0 1208

I get 151 frames of 12463200 byte raw data by get method.

from asammdf import MDF
raw_samples=MDF("video_data.mf4").get("VideoRawdata_VC0",samples_only=True)
print(raw_samples.shape,raw_samples.dtype)

(151, 12463200) uint8

I attach signal data of "VideoRawdata_VC0". signal_VideoRawdata_VC0.txt

from asammdf import MDF
mdf=MDF("video_data.mf4")
signal_VideoRawdata_VC0=mdf.get("VideoRawdata_VC0")
with open('signal_VideoRawdata_VC0.txt', 'w') as f:
  print(signal_VideoRawdata_VC0, file=f)

I noticed 1208(length of VideoRawdata_VC0_raw_data)=151(frame number)*8 Can I get raw data address by each 8 byte of VideoRawdata_VC0_raw_data?

Oct 16 '22 04:10 c13proto

from asammdf import MDF
from asammdf.blocks.cutils import get_channel_raw_bytes
mdf = MDF("video_data.mf4")

group_index, channel_index = mdf.whereis("VideoRawdata_VC0")[0]

group = mdf.groups[group_index]

VideoRawdata_VC0_raw_data = mdf._load_signal_data(group, channel_index)

Oct 18 '22 18:10 danielhrisca

@danielhrisca I found _load_signal_data method returns 4byte bigger data for each frame compare to get method. How can I get this offset information?

from asammdf import MDF
from asammdf.blocks.cutils import get_channel_raw_bytes
mdf = MDF("video_data.mf4")
group_index, channel_index = mdf.whereis("VideoRawdata_VC0")[0]
group = mdf.groups[group_index]
samples=mdf.get("VideoRawdata_VC0").samples#to compare debug

VideoRawdata_VC0_raw_data = mdf._load_signal_data(group, channel_index)

print("samples.shape",samples.shape)
num_of_samples=samples.shape[0]#151

signal_length=len(VideoRawdata_VC0_raw_data)/num_of_samples
print("signal_length",signal_length)
print("signal data:")
for i in range(30):
    print(VideoRawdata_VC0_raw_data[i],end=" ")# what is first 4 byte?
print("")
print("samples[0] data:")
print(samples[0][:30])#reference data

output of above code

samples.shape (151, 12463200)
signal_length 12463204.0
signal data:
96 44 190 0 0 0 0 0 44 128 22 35 12 12 255 13 11 166 13 16 17 13 15 136 12 14 249 14 19 100 
samples[0] data:
[  0   0   0   0  44 128  22  35  12  12 255  13  11 166  13  16  17  13
  15 136  12  14 249  14  19 100  12  14 249  11]

And unfortunately, _load_signal_data method also not fast enough for my use case. So, I want to make address list of each sample data on python to process binary data by c++. I hope I can make it with this library. Thank you for your support.

Oct 19 '22 10:10 c13proto

My bad, the data includes also a 4 byte header that holds the frame length so you should skip the first 4 bytes from each frame.

In [3]: struct.unpack('<I', bytes([96,44,190,0]))
Out[3]: (12463200,)

And unfortunately, _load_signal_data method also not fast enough for my use case.

It's the fastest method. Please share the output of this call:

for info in group.get_signal_data_blocks(channel_index):
    print(info)

Oct 19 '22 13:10 danielhrisca

the data includes also a 4 byte header that holds the frame length

I understand, thank you.

I tried below code.

for i,info in enumerate(group.get_signal_data_blocks(channel_index)):
       print(i,info)

output

0 SignalDataBlockInfo(address=0x1608, original_size=37389612, compressed_size=37389612, block_type=0)
...
47 SignalDataBlockInfo(address=0x68BE8E40, original_size=37389612, compressed_size=37389612, block_type=0)
48 SignalDataBlockInfo(address=0x6AF91388, original_size=37389612, compressed_size=37389612, block_type=0)
49 SignalDataBlockInfo(address=0x6D3398D0, original_size=37389612, compressed_size=37389612, block_type=0)
50 SignalDataBlockInfo(address=0x6F6E2A40, original_size=12463204, compressed_size=12463204, block_type=0)

Then I checked some listed address. All of them include 4 byte headder([96,44,190,0]), SignalDataBlock0~49 include 3 frame signal data. This looks enough speed and information.
I really appriciate your quick answer!

Oct 19 '22 16:10 c13proto

@danielhrisca

Sorry, I am still facing problem, Please let me reopen.

I have another very big mdf file(more than 30GB). Below constructor takes more than 120 seconds with this file. Is it possible to reduce the time? (previous 1.8GB data takes less than 0.01 seconds. this difference looks strange.)

mdf = MDF("video_data_big.mf4")#need to wait 2min..

and signal data block info is like below

SignalDataBlockInfo(address = 0x859367, original_size = 12463204, compressed_size = 8757417, block_type = 3)

I am struggling to decode this data. Below is my trial, runtime error occurs. I am not sure what I am making mistake.

from lz4.frame import decompress as lz_decompress
from zlib import decompress
compressed_data:bytes
#(address = 0x859367, original_size = 12463204, compressed_size = 8757417, block_type = 3)
address= 0x859367
compressed_size=8757417
with open("video_data_big.mf4","rb") as f:
    f.seek(address)
    compressed_data=f.read(compressed_size)

for i in range(20):
    print(compressed_data[i],end=" ")
print("")
decompressed_data=lz_decompress(compressed_data)

output of above code

166 26 169 168 104 169 166 202 169 169 41 166 168 39 167 167 62 168 164 6 
(omit some message)
---> decompressed_data=lz_decompress(compressed_data)
RuntimeError: LZ4F_getFrameInfo failed with code: ERROR_frameType_unknown

I guess I need to read temporary object. Provided signal address is not for the original file. Is it possible to make address list for original .mf4 file even if file is very big? https://github.com/danielhrisca/asammdf/blob/master/asammdf/blocks/mdf_v4.py#L1177

Oct 21 '22 09:10 c13proto

@danielhrisca I think I need to change library to access temporary object by c++. https://github.com/danielhrisca/asammdf/blob/master/asammdf/blocks/mdf_v4.py#L312

#self._tempfile = TemporaryFile(dir=self.temporary_folder)
self._tempfile = NamedTemporaryFile(dir=self.temporary_folder)

If possible, I want you make option to use NamedTemporaryFile. If user set "temporary_folder", asammdf shold better use NamedTemporaryFile I guess.

Oct 25 '22 01:10 c13proto

0 0 0

@danielhrisca Could you please expalin in short the difference between raw signal datra and sample. Moreover while

the data includes also a 4 byte header that holds the frame length

I understand, thank you.

I tried below code.
for i,info in enumerate(group.get_signal_data_blocks(channel_index)):
       print(i,info)
output
0 SignalDataBlockInfo(address=0x1608, original_size=37389612, compressed_size=37389612, block_type=0)
...
47 SignalDataBlockInfo(address=0x68BE8E40, original_size=37389612, compressed_size=37389612, block_type=0)
48 SignalDataBlockInfo(address=0x6AF91388, original_size=37389612, compressed_size=37389612, block_type=0)
49 SignalDataBlockInfo(address=0x6D3398D0, original_size=37389612, compressed_size=37389612, block_type=0)
50 SignalDataBlockInfo(address=0x6F6E2A40, original_size=12463204, compressed_size=12463204, block_type=0)
Then I checked some listed address. All of them include 4 byte headder([96,44,190,0]), SignalDataBlock0~49 include 3 frame signal data. This looks enough speed and information. I really appriciate your quick answer!

@danielhrisca

@danielhrisca Could you please expalin in short the difference between raw signal datra and sample. Moreover while i the signalDatablockInfo what is the address and original size over here means? Here there are 50 different signaldata block what does that mean too? Your help will be appreciable. Thank you

Nov 10 '22 06:11 elon1992

96 44 190 0

@danielhrisca After that 4 bytes (header) which is frame length, the original data record i.e payload starts or there in the sample there is also header file from data block, then if ethernet is used then ethernet frame format and then the record? Could you please tell me the structure or can you show the code where you applied this?

Nov 10 '22 07:11 elon1992

@danielhrisca I found _load_signal_data method returns 4byte bigger data for each frame compare to get method. How can I get this offset information?

from asammdf import MDF
from asammdf.blocks.cutils import get_channel_raw_bytes
mdf = MDF("video_data.mf4")
group_index, channel_index = mdf.whereis("VideoRawdata_VC0")[0]
group = mdf.groups[group_index]
samples=mdf.get("VideoRawdata_VC0").samples#to compare debug

VideoRawdata_VC0_raw_data = mdf._load_signal_data(group, channel_index)

print("samples.shape",samples.shape)
num_of_samples=samples.shape[0]#151

signal_length=len(VideoRawdata_VC0_raw_data)/num_of_samples
print("signal_length",signal_length)
print("signal data:")
for i in range(30):
    print(VideoRawdata_VC0_raw_data[i],end=" ")# what is first 4 byte?
print("")
print("samples[0] data:")
print(samples[0][:30])#reference data

output of above code

samples.shape (151, 12463200)
signal_length 12463204.0
signal data:
96 44 190 0 0 0 0 0 44 128 22 35 12 12 255 13 11 166 13 16 17 13 15 136 12 14 249 14 19 100 
samples[0] data:
[  0   0   0   0  44 128  22  35  12  12 255  13  11 166  13  16  17  13
  15 136  12  14 249  14  19 100  12  14 249  11]

And unfortunately, _load_signal_data method also not fast enough for my use case. So, I want to make address list of each sample data on python to process binary data by c++. I hope I can make it with this library. Thank you for your support.

@danielhrisca while running samples=mdf.get("VideoRawdata_VC0").samples this lines i am getting an error that states that the multiple occurance of the same channel names in different channel group. Any suggestions please

Nov 10 '22 07:11 elon1992

@danielhrisca while running samples=mdf.get("VideoRawdata_VC0").samples this lines i am getting an error that states that the multiple occurance of the same channel names in different channel group. Any suggestions please

the signal name occurs multiple times in the measurement for different channels. You can see the data group indexes and channel group indexes with this code

print(mdf.whereis("VideoRawdata_VC0")

This is a rough overview of the MDF internal structure https://www.asam.net/standards/detail/mdf/wiki/

Nov 10 '22 08:11 danielhrisca

@danielhrisca > Moreover while i the signalDatablockInfo what is the address and original size over here means?

Here there are 50 different signaldata block what does that mean too?

@danielhrisca as per documentation - how can we get the number of links that we have in my mdf4 file Thank you so much, but i am not very clear with that documentation. As i was asking that the samples that we are printing of that specific channel using select() method and .whereis(), Do this sample only have the payload data or it also have the different header bytes? Moreover while in the signalDatablockInfo what is the address and original size over here means? Here there are 50 different signaldata block what does that mean too?

Thank you

Nov 10 '22 10:11 elon1992

@elon1992 you need to study the MDF v4 specifications, there are too many things to explain here

Oct 23 '23 07:10 danielhrisca

asammdf asammdf copied to clipboard

any efficient way to load big samples?

asammdf
asammdf copied to clipboard