c3c icon indicating copy to clipboard operation
c3c copied to clipboard

filesystem stdlib design

Open kvk1920 opened this issue 4 years ago • 8 comments

C3 needs some cross-platform way for paths/files manipulation, because in C programmers have to write platform-specific code to work with file system. The different path's encoding is an issue too.

kvk1920 avatar Jul 13 '21 09:07 kvk1920

There will be a kind of path_t object, which should be able to store paths, but there are some options, and I can't decide which one is better

  1. All paths stored as UTF-8. Pros:
  • printing and other using without overhead, programmer will get an opportunity to write cross-platform code Cons:
  • but even fopen may lead to memory allocation to convert path into right encoding (e.g. UTF-16 on Windows), and this overhead will occur every call of fopen, stat, mkdir, etc.
  1. All paths use native path encoding (e.g. UTF-16 on Windows) Pros:
  • closer to platform
  • all file ops without memory allocation (but memory allocation is still needed for FILE object) Cons:
  • all operations with string representation of path will be very exansive, in this case some kind of PathBuilder should be provided to allow programmer make operations with path faster
  1. Store all paths in UTF-8, but reserve space for conversions. For example, use N * 2 bytes to store path, which length is N. Pros:
  • UTF-8 <-> UTF-16 conversions can be done inplace without using of any additional memory
  • path_t object will Cons:
  • Even that just string conversion is better than memory allocation, there is still time overhead
  • Concurrency? (What if 2nd thread accesses path_t object, which is now in native encoding, not in UTF-8?
  1. Use UTF-32?

kvk1920 avatar Jul 13 '21 09:07 kvk1920

We can also add both platform independent and platform dependent versions of the code. If the user wants to optimize, then the platform dependent versions may be used.

lerno avatar Jul 13 '21 09:07 lerno

I think it's better to have some platform independent version, so (1) here. This will make it easier to be API stable, then this converts to the platform specifics. This can be coupled with a platform API, so that the platform API exposes the underlying platform directly. When platform dependent features are needed / maximum performance, the platform API can be used.

So something like:

  1. std::files (platform independent code)
  2. std::files::win (platform dependent code for windows)
  3. std::files::mac (platform dependent code for mac)

etc

lerno avatar Oct 06 '21 11:10 lerno

C++17 https://en.cppreference.com/w/cpp/filesystem as example?

data-man avatar Oct 06 '21 11:10 data-man

Do you mean for namespace or for functionality? It's useful to also look at Ruby, Java and the ObjC functionality in Cocoa. It might be better to split this into multiple tasks for each part later on.

lerno avatar Oct 06 '21 13:10 lerno

Java and ObjC uses a general URL-like approach, of which a file path is merely a subtype, it both have good and bad parts.

lerno avatar Oct 06 '21 13:10 lerno

Do you mean for namespace or for functionality?

Both, I guess.

Ruby, Java and the ObjC

Or D and Rust? :)

But there are probably a lot of tasks in between:

  • Unicode
  • strings
  • algorithms

data-man avatar Oct 07 '21 05:10 data-man

As a general thing having libc available is a stopgap obviously, but it's there.

lerno avatar Oct 19 '21 14:10 lerno

Currently I am working on this. Path is the normalized (and safe) path. The various operations (appending a file, getting the extension etc, all of those work on path). This is UTF-8, but it knows if the path is is using is Windows or Posix.

lerno avatar Mar 14 '23 10:03 lerno

I'll close this for now.

lerno avatar Jun 02 '23 09:06 lerno