azure-sdk-for-cpp
azure-sdk-for-cpp copied to clipboard
Reduce uses of static_cast to size_t in storage code, including lambdas
This is follow up to https://github.com/Azure/azure-sdk-for-cpp/pull/2285#discussion_r634855945, we need to update the logic to use size_t in more callsites to BodyStream types, rather than int64_t.
For example: https://github.com/ahsonkhan/azure-sdk-for-cpp/blob/4c3e1e9f31ef6ee762e15f1d9a1ea48c4fcbdb76/sdk/storage/azure-storage-files-shares/src/share_file_client.cpp#L941-L949
There are a few other places that need to be fixed from the fallout of this related PR: https://github.com/Azure/azure-sdk-for-cpp/pull/2310
The tricky part is that some of these lengths and offsets are coming from the user options/http range which is int64_t. We'll probably need some sort of argument guards for that.
For example: https://github.com/ahsonkhan/azure-sdk-for-cpp/blob/4c3e1e9f31ef6ee762e15f1d9a1ea48c4fcbdb76/sdk/storage/azure-storage-files-shares/src/share_file_client.cpp#L723-L727
Unless @JeffreyRichter has a different suggestion.
The int64_t looks good to me here
- Jeffrey Richter (see Architecting Distributed Cloud Appshttps://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Faka.ms%2FRichterCloudApps&data=04%7C01%7C%7Cdc05b9290f9f445693ec08d8e1b30ed8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637507506469305857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TeBH6rge3q8VyFBwCrBXfSXND4oajfqayGLiZq1gmFo%3D&reserved=0 & Designing HTTP APIshttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D9Ng00IlBCtw%26list%3DPL9XzOCngAkqs4m0XdULJu_78nM3Ok3Q65%26index%3D1&data=04%7C01%7C%7Cdc05b9290f9f445693ec08d8e1b30ed8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637507506469305857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nzunz4x55lMRzHb6qJVXuRkrlAAI03wXjPdexaPavzs%3D&reserved=0)
From: Ahson Khan @.> Sent: Tuesday, May 18, 2021 7:40:11 PM To: Azure/azure-sdk-for-cpp @.> Cc: Jeffrey Richter @.>; Mention @.> Subject: Re: [Azure/azure-sdk-for-cpp] Reduce uses of static_cast to size_t in storage code, including lambdas (#2313)
The tricky part is that some of these lengths and offsets are coming from the user options/http range which is int64_t. We'll probably need some sort of argument guards for that.
For example: https://github.com/ahsonkhan/azure-sdk-for-cpp/blob/4c3e1e9f31ef6ee762e15f1d9a1ea48c4fcbdb76/sdk/storage/azure-storage-files-shares/src/share_file_client.cpp#L723-L727https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fahsonkhan%2Fazure-sdk-for-cpp%2Fblob%2F4c3e1e9f31ef6ee762e15f1d9a1ea48c4fcbdb76%2Fsdk%2Fstorage%2Fazure-storage-files-shares%2Fsrc%2Fshare_file_client.cpp%23L723-L727&data=04%7C01%7C%7C9842288845234251393d08d91a6f6c7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637569888120392324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QOMvm37P7pn%2FnQCwXtentfKpAl5ojK1SwGRBABeAgS4%3D&reserved=0
Unless @JeffreyRichterhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJeffreyRichter&data=04%7C01%7C%7C9842288845234251393d08d91a6f6c7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637569888120402272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WDv0PiLzzDqh8pupj5Y6SESyJx%2BhfHGo07NEbcqpbV0%3D&reserved=0 has a different suggestion.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-cpp%2Fissues%2F2313%23issuecomment-843699907&data=04%7C01%7C%7C9842288845234251393d08d91a6f6c7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637569888120402272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gN5WA6U1LMjEgznV5Ssup97rpHeX7BAxxNunfuQ3HW0%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAARLJP47P2YZ67BEB52UVFTTOMQIXANCNFSM45DUSGOA&data=04%7C01%7C%7C9842288845234251393d08d91a6f6c7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637569888120412232%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Syh3usHyEQmq0B9R2AhfWZrGQVFsEuGcI0w691bAWJA%3D&reserved=0.
The int64_t looks good to me here
@JeffreyRichter are you suggesting we do some casting, OR do some argument validation and throw? The ReadToCount
API expects size_t
, we cant pass in an int64_t
option directly to that method because they user might pass in a value that is too large.
I think ReadToCount should take an int64_t. Without there be any harm in doing this?
- Jeffrey Richter (see Architecting Distributed Cloud Appshttps://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Faka.ms%2FRichterCloudApps&data=04%7C01%7C%7Cdc05b9290f9f445693ec08d8e1b30ed8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637507506469305857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TeBH6rge3q8VyFBwCrBXfSXND4oajfqayGLiZq1gmFo%3D&reserved=0 & Designing HTTP APIshttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D9Ng00IlBCtw%26list%3DPL9XzOCngAkqs4m0XdULJu_78nM3Ok3Q65%26index%3D1&data=04%7C01%7C%7Cdc05b9290f9f445693ec08d8e1b30ed8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637507506469305857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nzunz4x55lMRzHb6qJVXuRkrlAAI03wXjPdexaPavzs%3D&reserved=0)
From: Ahson Khan @.> Sent: Tuesday, May 18, 2021 8:31:53 PM To: Azure/azure-sdk-for-cpp @.> Cc: Jeffrey Richter @.>; Mention @.> Subject: Re: [Azure/azure-sdk-for-cpp] Reduce uses of static_cast to size_t in storage code, including lambdas (#2313)
The int64_t looks good to me here
@JeffreyRichterhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJeffreyRichter&data=04%7C01%7C%7Cdbaa3bc1306a43d65b8408d91a76a620%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637569919155433368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2xfTtM6lu4PA8TdbiH07gfy9EAzknV0DIVi82pjyjV0%3D&reserved=0 are you suggesting we do some casting? The ReadToCount API expects size_t, we cant pass in an int64_t option directly to that method because they user might pass in a value that is too large.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-cpp%2Fissues%2F2313%23issuecomment-843718594&data=04%7C01%7C%7Cdbaa3bc1306a43d65b8408d91a76a620%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637569919155433368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bjPCGDSRaoSroKxOCma2KWS7qvhnZdFaIlq1jgB%2FkUg%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAARLJP5FX3VILWIMYN6ZB5DTOMWKTANCNFSM45DUSGOA&data=04%7C01%7C%7Cdbaa3bc1306a43d65b8408d91a76a620%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637569919155433368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0J3O546bZaTvdd%2BMHhGQWz3SBC1s9exD%2FO8TW%2FhHFpY%3D&reserved=0.
I think ReadToCount should take an int64_t. Without there be any harm in doing this?
https://github.com/Azure/azure-sdk-for-cpp/blob/75d17550f26c05086cc429fac0bb96a34bd40009/sdk/core/azure-core/inc/azure/core/io/body_stream.hpp#L102-L105
The guidelines state lengths of buffers in memory needs to be size_t. Just like Read
and OnRead
.
For example, how can ReadToCount support int64_t
for a MemoryBodyStream, when our implementation of OnRead is based on memcpy which expects and is limited to size_t:
https://github.com/Azure/azure-sdk-for-cpp/blob/75d17550f26c05086cc429fac0bb96a34bd40009/sdk/core/azure-core/src/io/body_stream.cpp#L85
That's what we discussed and changed here: https://github.com/Azure/azure-sdk-for-cpp/issues/2226 https://github.com/Azure/azure-sdk-for-cpp/pull/2285
I don't care what ReadToCount
returns, either size_t
or int64_t
looks good to me.
You may also want to consider the arithmetic operations between the return type of ReadToCount
and offset.
BodyStream in core sdk is not seekable, so you don't have to deal with offset.
But storage sdk has to deal with offset, which has to be a 64-bit variable. So it cannot be size_t
. It's either int64_t
or uint64_t
.
OK, BodyStream functions that take a pointer to a buffer can use size_t for the count. XxxClient functions that work with file sizes should take int64_t. It is possible that customer code will try to download a large file into memory that can't hold it. At these locations in code the customer would have to cast the size64_t to size_t and this is OK because the cast is how they are explicitly stating that they are aware of data loss. The customer code might know that the downloaded data will never be greater than size_t or, the customer code could check for this before doing the cast and take an alternate code path or produce its own exception.
OK, BodyStream functions that take a pointer to a buffer can use size_t for the count. XxxClient functions that work with file sizes should take int64_t. It is possible that customer code will try to download a large file into memory that can't hold it. At these locations in code the customer would have to cast the size64_t to size_t and this is OK because the cast is how they are explicitly stating that they are aware of data loss. The customer code might know that the downloaded data will never be greater than size_t or, the customer code could check for this before doing the cast and take an alternate code path or produce its own exception.
Is it a reasonable scenario that a customer downloads a blob larger than 4GB into a FileBodyStream, in a 32-bit application?
The data in the memory at any time is not too large, but the total transferred size over time can be large. In this case, returning size_t
may not be a good idea.
Yes, a customer can download a >4GB file in a 32-bit process but they must do it is chunks that fit in memory. For example, they could read 1GB into a memory buffer and then write that buffer to the file and then read the next 1GB into the same buffer and write the buffer to the file, and so on. Each call to BodyStream's read take a pointer and length; the length can never be larger than size_t (unsigned) because it's impossible. So, Read can also return a size_t since it must be between 0 and the size_t passed in to Read.
If the SDK provides an abstraction that downloads a BodyStream to a file (and we DO provide this abstraction today), then the chunk size (specified by the customer) should be size_t but if there was some max file size parameter (or similar), then that should be int64_t.
Hi @ahsonkhan, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.