{"id":70,"date":"2005-01-29T18:42:52","date_gmt":"2005-01-30T02:42:52","guid":{"rendered":"\/?p=70"},"modified":"2005-04-25T01:19:59","modified_gmt":"2005-04-25T08:19:59","slug":"file-manipulation-is-harder-than-it-ought-to-be","status":"publish","type":"post","link":"https:\/\/www.slimjimmy.com\/weblog\/archives\/2005\/01\/29\/file-manipulation-is-harder-than-it-ought-to-be\/","title":{"rendered":"File manipulation is harder than it ought to be."},"content":{"rendered":"<p>I am amazed that programming languages (well, the typical ones, at least)  don&#8217;t make it easier to manipulate files.<\/p>\n<p>A common way files are read in C is to create a <tt>struct<\/tt> that matches the file format and to call <tt>fread<\/tt> to read the file into it.  Isn&#8217;t that easy enough?<\/p>\n<p>Not really.  <strong>This approach is fine in isolation, but it&#8217;s non-portable<\/strong>:<\/p>\n<ul>\n<li><strong>Different architectures or compilers may lay out <tt>struct<\/tt>s differently.<\/strong> Your compiler sometimes can choose to add padding bytes to guarantee alignment requirements.  Luckily compilers aren&#8217;t allowed to do it willy-nilly, and some compilers offer <tt>#pragma<\/tt>s to control this.<\/li>\n<li><strong>Different architectures have different integer sizes.<\/strong> Appropriate <tt>typedef<\/tt>s often  can mitigate this, but it&#8217;s still imperfect since it requires a small porting effort.<\/li>\n<li><strong>Different architectures use different endianness.<\/strong>  If a file format is defined to store integers in big-endian byte order but your architecture is little-endian, then if you read the bytes out of the <tt>struct<\/tt> without first swapping the bytes you&#8217;ll end up with the wrong value.<\/li>\n<\/ul>\n<p>The typical way to solve these problems is to read a file a byte at a time, copying each byte into the appropriate location within the <tt>struct<\/tt>.  This is tedious.<\/p>\n<p>Programming languages should provide a mechanism for programmers to declare a <tt>struct<\/tt> that must conform to some external format requirement.  Programmers should be able to attribute the <tt>struct<\/tt>, prohibiting implicit padding bytes and specifying what the size and endian requirements are for each field.  For example:<\/p>\n<pre class=\"example\">\r\nfile_struct myFileFormat\r\n{\r\n    uint8 version;\r\n    uint8[3]; \/\/ Reserved.\r\n    uint32BE numElements;\r\n    uint32BE dataOffset;\r\n};\r\n<\/pre>\n<p>When retrieving fields from such a <tt>struct<\/tt>, the compiler should generate code that automatically performs the necessary byte swaps and internal type promotions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am amazed that programming languages (well, the typical ones, at least) don&#8217;t make it easier to manipulate files. A common way files are read in C is to create a struct that matches the file format and to call fread to read the file into it. Isn&#8217;t that easy enough? Not really. This approach [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,2],"tags":[],"class_list":["post-70","post","type-post","status-publish","format-standard","hentry","category-programming","category-rantsraves"],"_links":{"self":[{"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/posts\/70","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/comments?post=70"}],"version-history":[{"count":0,"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/posts\/70\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/media?parent=70"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/categories?post=70"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.slimjimmy.com\/weblog\/wp-json\/wp\/v2\/tags?post=70"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}