formats
formats copied to clipboard
Question on decoding large data structures
Hi there,
I'm trying to use ASN1 to store a large data set (up to 200 MB or so). This in replacement of a file that is currently packed struct stored as bytes to disk. I was wondering if you had any hints on the best approach to do this.
The data I have is an "ephemeris" which stores several "splines." Before starting the encoding, I do not know how many splines I'll need to encode/decode, but it could be several thousands. The reference file I'm using defines that length up front and then libraries perform a direct access to the correct spline (there's a specific algorithm on how to retrieve the correct spline number from some input information, and then seeking through the file is sufficient to grab the data).
In an attempt to mimic this behavior, I'm trying the following structure:
pub struct Ephemeris<'a> {
pub spline_duration_s: f64,
pub splines: &'a [Spline<'a>], // We can't do SequenceOf<Spline<'a>, N>, because N would be too large to fit on the stack
}
And the following encoding and decoding implementations (the encoding works, but the decoding fails)
impl<'a> Encode for Ephemeris<'a> {
fn encoded_len(&self) -> der::Result<der::Length> {
self.spline_duration_s.encoded_len()?
+ self.splines.iter().fold(Length::new(2), |acc, spline| {
(acc + spline.encoded_len().unwrap()).unwrap()
})
}
fn encode(&self, encoder: &mut der::Encoder<'_>) -> der::Result<()> {
encoder.encode(&self.spline_duration_s)?;
encoder.sequence(Length::new(self.splines.len() as u16), |encoder| {
for spline in self.splines {
encoder.encode(spline)?;
}
Ok(())
})
}
}
impl<'a> Decode<'a> for Ephemeris<'a> {
fn decode(decoder: &mut Decoder<'a>) -> der::Result<Self> {
let spline_duration_s = decoder.decode()?;
let expected_len: u32 = decoder.peek_header().unwrap().length.into();
dbg!(expected_len);
// XXX: how can I point each spline to the input buffer? I can't perform an alloc, nor can I
let mut splines: &'a [Spline<'a>; 1000]; // XXX: How do I even initialize this?
decoder.sequence(|decoder| {
for idx in 0..expected_len {
splines[idx as usize] = decoder.decode()?;
}
Ok(())
});
Ok(Self {
spline_duration_s,
splines,
})
}
}
Question
- Is this a bad approach and if so, what would you recommend instead?
- If this is a reasonable approach, how can I get the
splines
field to point to the decoded splines?
Appendix
For reference, here is how a spline is defined, where x,y,z
are encoded as OctetStrings.
pub struct Spline<'a> {
pub start_epoch: Epoch,
pub end_epoch: Epoch,
/// State information (km)
pub x: &'a [u8],
/// State information (km)
pub y: &'a [u8],
/// State information (km)
pub z: &'a [u8],
Thanks for your help
We've had streaming decoding come up in a few other scenarios, particularly CRLs. Random access to data which doesn't fit in memory is a different matter though.
I'd suggest only using the der
crate for modeling the "splines". You'll need to build some sort of index for figuring out their position offsets. You can decode the Header
of each "spline" to figure out their length, then iterate over them to build the index, keeping track of each spline's location.
That's the best I can offer for now. We may at least make the streaming decoding case easier soon in order to support CRLs.