asammdf
asammdf copied to clipboard
any efficient way to load big samples?
I have big .mf4 file(more than 1GB) which includes raw data for 4k size video frames. I am using MDF.get method to extract raw data.
from asammdf import MDF
raw_samples=MDF("video_data.mf4").get("VideoRawdata_VC0",samples_only=True)
But this takes very long time. To avoid unnecessary computing time, I want to load .mf4 file as binary byte array first, then extract data from start address of raw data. Is it possible to get this offset address(start address of raw data) using asammdf? Also, please advice me if you know another efficient way to extract raw data.
thank you.
Things are not as easy as pointing to the start of the data section. You need to take care of the signal position and size in the record.
maybe this can help
from asammdf import MDF
from asammdf.blocks.cutils import get_channel_raw_bytes
mdf = MDF("video_data.mf4")
group_index, channel_index = mdf.whereis("VideoRawdata_VC0")[0]
group = mdf.groups[group_index]
info = group.record[channel_index]
dtype_, byte_size, byte_offset, bit_offset = info
record_size = group.channel_group.samples_byte_nr
invalidation_bytes_nr = group.channel_group.invalidation_bytes_nr
channel_parts = []
for data_bytes, *_ in mdf._load_data(group):
VideoRawdata_VC0_part = get_channel_raw_bytes(
data_bytes,
record_size + invalidation_bytes_nr,
byte_offset,
byte_size,
)
channel_parts.append(VideoRawdata_VC0_part)
VideoRawdata_VC0_raw_data = b''.join(channel_parts)
Thank you so much .I have not full understood your code yet so I am not sure what is VideoRawdata_VC0_raw_data inside. But I think VideoRawdata_VC0_raw_data is too short(1208 length). I print some variables which are used in proposal code.
print(group,info,record_size,invalidation_bytes_nr,len(VideoRawdata_VC0_raw_data))
<asammdf.blocks.utils.Group object at 0x7f0ff517c460> (dtype('uint64'), 8, 12, 0) 20 0 1208
I get 151 frames of 12463200 byte raw data by get method.
from asammdf import MDF
raw_samples=MDF("video_data.mf4").get("VideoRawdata_VC0",samples_only=True)
print(raw_samples.shape,raw_samples.dtype)
(151, 12463200) uint8
I attach signal data of "VideoRawdata_VC0". signal_VideoRawdata_VC0.txt
from asammdf import MDF
mdf=MDF("video_data.mf4")
signal_VideoRawdata_VC0=mdf.get("VideoRawdata_VC0")
with open('signal_VideoRawdata_VC0.txt', 'w') as f:
print(signal_VideoRawdata_VC0, file=f)
I noticed 1208(length of VideoRawdata_VC0_raw_data)=151(frame number)*8 Can I get raw data address by each 8 byte of VideoRawdata_VC0_raw_data?
from asammdf import MDF
from asammdf.blocks.cutils import get_channel_raw_bytes
mdf = MDF("video_data.mf4")
group_index, channel_index = mdf.whereis("VideoRawdata_VC0")[0]
group = mdf.groups[group_index]
VideoRawdata_VC0_raw_data = mdf._load_signal_data(group, channel_index)
@danielhrisca I found _load_signal_data method returns 4byte bigger data for each frame compare to get method. How can I get this offset information?
from asammdf import MDF
from asammdf.blocks.cutils import get_channel_raw_bytes
mdf = MDF("video_data.mf4")
group_index, channel_index = mdf.whereis("VideoRawdata_VC0")[0]
group = mdf.groups[group_index]
samples=mdf.get("VideoRawdata_VC0").samples#to compare debug
VideoRawdata_VC0_raw_data = mdf._load_signal_data(group, channel_index)
print("samples.shape",samples.shape)
num_of_samples=samples.shape[0]#151
signal_length=len(VideoRawdata_VC0_raw_data)/num_of_samples
print("signal_length",signal_length)
print("signal data:")
for i in range(30):
print(VideoRawdata_VC0_raw_data[i],end=" ")# what is first 4 byte?
print("")
print("samples[0] data:")
print(samples[0][:30])#reference data
output of above code
samples.shape (151, 12463200)
signal_length 12463204.0
signal data:
96 44 190 0 0 0 0 0 44 128 22 35 12 12 255 13 11 166 13 16 17 13 15 136 12 14 249 14 19 100
samples[0] data:
[ 0 0 0 0 44 128 22 35 12 12 255 13 11 166 13 16 17 13
15 136 12 14 249 14 19 100 12 14 249 11]
And unfortunately, _load_signal_data method also not fast enough for my use case. So, I want to make address list of each sample data on python to process binary data by c++. I hope I can make it with this library. Thank you for your support.
My bad, the data includes also a 4 byte header that holds the frame length so you should skip the first 4 bytes from each frame.
In [3]: struct.unpack('<I', bytes([96,44,190,0]))
Out[3]: (12463200,)
And unfortunately, _load_signal_data method also not fast enough for my use case.
It's the fastest method. Please share the output of this call:
for info in group.get_signal_data_blocks(channel_index):
print(info)
the data includes also a 4 byte header that holds the frame length
I understand, thank you.
I tried below code.
for i,info in enumerate(group.get_signal_data_blocks(channel_index)):
print(i,info)
output
0 SignalDataBlockInfo(address=0x1608, original_size=37389612, compressed_size=37389612, block_type=0)
...
47 SignalDataBlockInfo(address=0x68BE8E40, original_size=37389612, compressed_size=37389612, block_type=0)
48 SignalDataBlockInfo(address=0x6AF91388, original_size=37389612, compressed_size=37389612, block_type=0)
49 SignalDataBlockInfo(address=0x6D3398D0, original_size=37389612, compressed_size=37389612, block_type=0)
50 SignalDataBlockInfo(address=0x6F6E2A40, original_size=12463204, compressed_size=12463204, block_type=0)
Then I checked some listed address.
All of them include 4 byte headder([96,44,190,0]), SignalDataBlock0~49 include 3 frame signal data.
This looks enough speed and information.
I really appriciate your quick answer!
@danielhrisca
Sorry, I am still facing problem, Please let me reopen.
I have another very big mdf file(more than 30GB). Below constructor takes more than 120 seconds with this file. Is it possible to reduce the time? (previous 1.8GB data takes less than 0.01 seconds. this difference looks strange.)
mdf = MDF("video_data_big.mf4")#need to wait 2min..
and signal data block info is like below
SignalDataBlockInfo(address = 0x859367, original_size = 12463204, compressed_size = 8757417, block_type = 3)
I am struggling to decode this data. Below is my trial, runtime error occurs. I am not sure what I am making mistake.
from lz4.frame import decompress as lz_decompress
from zlib import decompress
compressed_data:bytes
#(address = 0x859367, original_size = 12463204, compressed_size = 8757417, block_type = 3)
address= 0x859367
compressed_size=8757417
with open("video_data_big.mf4","rb") as f:
f.seek(address)
compressed_data=f.read(compressed_size)
for i in range(20):
print(compressed_data[i],end=" ")
print("")
decompressed_data=lz_decompress(compressed_data)
output of above code
166 26 169 168 104 169 166 202 169 169 41 166 168 39 167 167 62 168 164 6
(omit some message)
---> decompressed_data=lz_decompress(compressed_data)
RuntimeError: LZ4F_getFrameInfo failed with code: ERROR_frameType_unknown
I guess I need to read temporary object. Provided signal address is not for the original file. Is it possible to make address list for original .mf4 file even if file is very big? https://github.com/danielhrisca/asammdf/blob/master/asammdf/blocks/mdf_v4.py#L1177
@danielhrisca I think I need to change library to access temporary object by c++. https://github.com/danielhrisca/asammdf/blob/master/asammdf/blocks/mdf_v4.py#L312
#self._tempfile = TemporaryFile(dir=self.temporary_folder)
self._tempfile = NamedTemporaryFile(dir=self.temporary_folder)
If possible, I want you make option to use NamedTemporaryFile. If user set "temporary_folder", asammdf shold better use NamedTemporaryFile I guess.
0 0 0
@danielhrisca Could you please expalin in short the difference between raw signal datra and sample. Moreover while
the data includes also a 4 byte header that holds the frame length
I understand, thank you.
I tried below code.
for i,info in enumerate(group.get_signal_data_blocks(channel_index)): print(i,info)output
0 SignalDataBlockInfo(address=0x1608, original_size=37389612, compressed_size=37389612, block_type=0) ... 47 SignalDataBlockInfo(address=0x68BE8E40, original_size=37389612, compressed_size=37389612, block_type=0) 48 SignalDataBlockInfo(address=0x6AF91388, original_size=37389612, compressed_size=37389612, block_type=0) 49 SignalDataBlockInfo(address=0x6D3398D0, original_size=37389612, compressed_size=37389612, block_type=0) 50 SignalDataBlockInfo(address=0x6F6E2A40, original_size=12463204, compressed_size=12463204, block_type=0)Then I checked some listed address. All of them include 4 byte headder([96,44,190,0]), SignalDataBlock0~49 include 3 frame signal data. This looks enough speed and information. I really appriciate your quick answer!
@danielhrisca
@danielhrisca Could you please expalin in short the difference between raw signal datra and sample. Moreover while i the signalDatablockInfo what is the address and original size over here means? Here there are 50 different signaldata block what does that mean too? Your help will be appreciable. Thank you
96 44 190 0
@danielhrisca After that 4 bytes (header) which is frame length, the original data record i.e payload starts or there in the sample there is also header file from data block, then if ethernet is used then ethernet frame format and then the record? Could you please tell me the structure or can you show the code where you applied this?
@danielhrisca I found _load_signal_data method returns 4byte bigger data for each frame compare to get method. How can I get this offset information?
from asammdf import MDF from asammdf.blocks.cutils import get_channel_raw_bytes mdf = MDF("video_data.mf4") group_index, channel_index = mdf.whereis("VideoRawdata_VC0")[0] group = mdf.groups[group_index] samples=mdf.get("VideoRawdata_VC0").samples#to compare debug VideoRawdata_VC0_raw_data = mdf._load_signal_data(group, channel_index) print("samples.shape",samples.shape) num_of_samples=samples.shape[0]#151 signal_length=len(VideoRawdata_VC0_raw_data)/num_of_samples print("signal_length",signal_length) print("signal data:") for i in range(30): print(VideoRawdata_VC0_raw_data[i],end=" ")# what is first 4 byte? print("") print("samples[0] data:") print(samples[0][:30])#reference dataoutput of above code
samples.shape (151, 12463200) signal_length 12463204.0 signal data: 96 44 190 0 0 0 0 0 44 128 22 35 12 12 255 13 11 166 13 16 17 13 15 136 12 14 249 14 19 100 samples[0] data: [ 0 0 0 0 44 128 22 35 12 12 255 13 11 166 13 16 17 13 15 136 12 14 249 14 19 100 12 14 249 11]And unfortunately, _load_signal_data method also not fast enough for my use case. So, I want to make address list of each sample data on python to process binary data by c++. I hope I can make it with this library. Thank you for your support.
@danielhrisca while running samples=mdf.get("VideoRawdata_VC0").samples this lines i am getting an error that states that the multiple occurance of the same channel names in different channel group. Any suggestions please
@danielhrisca while running samples=mdf.get("VideoRawdata_VC0").samples this lines i am getting an error that states that the multiple occurance of the same channel names in different channel group. Any suggestions please
the signal name occurs multiple times in the measurement for different channels. You can see the data group indexes and channel group indexes with this code
print(mdf.whereis("VideoRawdata_VC0")
This is a rough overview of the MDF internal structure https://www.asam.net/standards/detail/mdf/wiki/
@danielhrisca > Moreover while i the signalDatablockInfo what is the address and original size over here means?
Here there are 50 different signaldata block what does that mean too?
@danielhrisca as per documentation - how can we get the number of links that we have in my mdf4 file Thank you so much, but i am not very clear with that documentation. As i was asking that the samples that we are printing of that specific channel using select() method and .whereis(), Do this sample only have the payload data or it also have the different header bytes? Moreover while in the signalDatablockInfo what is the address and original size over here means? Here there are 50 different signaldata block what does that mean too?
Thank you
@elon1992 you need to study the MDF v4 specifications, there are too many things to explain here