MadBlog
Wednesday 7 May 2008

git prompt

Following Martin's post on the subject, since I created my prompt, I've updated quite a lot.

It only showed the branch before, now it does so even when I'm on a detached head or something, and also shows when I'm in the middle of a rebase or a merge. You can see the zsh-fu for this. For example, when I'm in the middle of a rebase on my paid work repository, it looks like:

┌─(10:34)──<~/dev/mmsx master <rebase -i>>──
└[artemis]                                                 (~/dev/mmsx/Build/)

With nice colors I cannot really show without a screenshot that I'm too lazy to do :) This is a recent addition that I shamelessly took from the contrib bash prompt in git-core package. And to be frank it's really needed, because it's cheap tests (basically looking for magic file names) and that it can tell you if you forgot to end a rebase or a merge, which can happen if you have been disturbed in the middle of it by a colleague for example.

I liked the '*' idea from Martin to show if the tree is dirty. Sadly it's not an option. Martin, to do that, you can do:

 unclean=
 git diff-files --quiet && git diff-index --cached --quiet HEAD -- || unclean='*'

But this is a very expensive operation. On the glibc git repository, it takes seconds with cold cache (and it's not very surprising because it basically has to stat(3) a lot of stuff). And not having a shell for seconds is a bit extreme.

PS: I know my prompt only supports git, but:

  1. I barely care about other VCSes as I only use git and sometimes svn for packaging ;
  2. when I have to use svn it's for cheap stuff where I don't really need the prompt help.
Monday 25 February 2008

Dear John…

wrt your issue yes it's true that for short series of patches, it's often asked to rebase it on a clean state if it doesn't merges fully, though if you're doing a complicated work, meaning dozens of patches e.g., it's usually two things:

  • you're a regular contributor ;
  • you spend quite some time working on it.

When this happens, upstreams are usually okay with merging from a public repository that you would set-up. Though, it's usually a tiny more work for the upstreams to work with a new remote repository, and those are only used for these cases.

Also note that upstream could really fake the same work branching off from the point you branched off, and using git-am from that point instead of git am -3 on the top of the current devel branch (which is similar to a rebase, hence creates new sha's).

IOW, it's not really a git deficiency (even if git-format-patch could maybe annotate _more_ where it comes from, and git-am grok that to re-create a topic branch from that) as git has the features, just that it probably doesn't make it easy enough, and that people usually don't care enough for very short series.

Tuesday 5 February 2008

mixed utf-8 and 8bit charset foo

Deinterlacing utf-8 from a 8bit encoding (say it's latin1, but it works with all 8bits encoding the same), and that could have multiple utf-8 rencoding, there is a very simple tool to write for the task.

Let's assume that you have somewhere a function void charset_to_utf8(FILE *, int c) that takes a FILE * it writes the utf-8 encoded character c. Then write something that roughly looks like that:

   static int utf8_wclen(const unsigned char *s, int maxlen)
   {
       static char const utf8_len[32] = {
           1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
           0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 3, 3, 4, 0,
       };
   
       int trail = utf8_len[(*s++) / 8];
       if (trail > maxlen)
           return 0;
       switch (trail) {
         case 4: if ((*s++ & 0xc0) != 0x80) return 0;
         case 3: if ((*s++ & 0xc0) != 0x80) return 0;
         case 2: if ((*s++ & 0xc0) != 0x80) return 0;
         case 1: return trail;
         default: return 0;
       }
   }
   
   int charset_utf8_deinterlace(FILE *f, void *data, int dlen)
   {
       const unsigned char *s = data;
       int pos = 0, res = 0;
   
       while (pos < dlen) {
           int wclen = utf8_wclen(s + pos, dlen - pos);
           if (wclen) {
               fwrite(s, wclen, 1, f);
               pos += wclen;
               res = 1;
           } else {
               /* assume its $charset */
               charset_to_utf8(f, s[pos++]);
           }
       }
       return res;
   }

You can easily base a tool that mmaps a file passed as an argument, and prints an utf-8 clean file to stdout, and that exits with a specific code when it met something that looked like valid utf-8[1]. let's call that tool recode_to_utf8, then if you fear you have multiple reencoding of your data, you need to do that:

 #! /bin/sh
 wrap_recode() {
   recode_to_utf8 "$1" > "$2"
   case $? in)
     0) true;;
     $still_utf8) false;;
     *) echo "WOOPS IO Error" 1>&2; exit 1;;
   esac
 }
 cp your_source dirty
 while ! wrap_recode dirty utf8_clean; do
   iconv -f utf8 -t $charset utf8_clean -o dirty
 done
 rm -f dirty
 # result in utf8_clean

This is all very sketchy, but I've never found a tool that does the job properly, and it's quite simple to derive tools from the methods above. Note that it assumes that it's highly unlikely that a valid sequence of your original charset can form a valid utf-8 codepoint, which for text is usually true (at least in latin1).

Notes

[1] you want the code to be specific to catch IO errors that I ignored for the sake of shortness, the post being quite long already

Wednesday 12 December 2007

Yay \o/

The french ISP free which I'm a custommer now gives ipv6 connectivity (through a curious 6to4 clone though). The nice news is that we now have really near 6to4 nodes in France all over the country. It even has quite a nice ping.

All the stuff is done through the modem, and gives autoconfiguration to the network. I just had to "modprobe ipv6" on my boxen and voilà !

I've not looked at reverse DNS yet, as it's not regular 6to4 (it has a 2a01 prefix) http://6to4.nro.net/ cannot do a thing about it yet. We'll see how it goes :)

Monday 26 November 2007

git-commit(1) vim ftplugin update…

Dear vim-ers, yes I know my git.vim ftplugin was kind of a dirty hack, and that it didn't worked that well.

Since a few days, git.git next branch shows relative paths in git-status(1) and that was the last hit to my already not that stable hack. So I raised the question on the git mailing list on how to do that plug-in properly, and it happens there is a perfect solution, that was here from day 1…

So here is a new release of the plugin that uses the special index[1] git-commit(1) builds for the occasion, hence can use git-diff(1) directly instead of parsing clumsily the commit buffer.

Direct benefits:

  • it does renaming detection well;
  • it works correctly when you are --amend'ing or committing something that was pushed to the index, and that the corresponding files have non staged local changes;
  • it's damn fast (even when you commit more than 10 files);
  • you can even hide whitespaces changes, just add let git_diff_opts = "-C -C -w -b" in your .vimrc and you're done !

Notes

[1] or commit staging area ;D

Saturday 10 November 2007

M(adM)utt pr0n

Here is what happens when a (mad)tty meets a (mad)mutt

C

Okay, for real it has zillions of usability issues right now:

  • you shouldn't resize the mutt without leaving your $EDITOR first ;
  • you cannot interact with the rest of mutt either ;

But all those things needs a proper event loop (and anyone that ever looked at the mutt code knows that well…).

and yes it works for emacs too

Sunday 9 September 2007

git awsome-ness [git rebase --interactive]

With the last git release, git-rebase gained a new option: --interactive.

If you already had the feeling that in a patch series of yours you should have ordered patches differently, or merged some, then this command is what you dreamed of. Here is how it works…

Let's pretend you want to rework your last 10 patches, you'll run:

   $ git rebase -i HEAD~10

It will launch your $EDITOR and you'll see something like:

   # Rebasing 16d3800..14f3d11 onto 16d3800 
   # 
   # Commands: 
   #  pick = use commit 
   #  edit = use commit, but stop for amending 
   #  squash = use commit, but meld into previous commit 
   # 
   # If you remove a line here THAT COMMIT WILL BE LOST. 
   # 
   pick 6270640 Simplify write_tree using strbuf's. 
   pick 27c528a Further strbuf re-engineering. 
   pick fd82c9a Eradicate yet-another-buffer implementation in buitin-rerere.c 
   pick eee488f More strbuf uses in cache-tree.c. 
   pick 16878b5 Add strbuf_rtrim and strbuf_insert. 
   pick e9081af Change semantics of interpolate to work like snprintf. 
   pick 99c3ef5 Rework pretty_print_commit to use strbufs instead of custom buffers. 
   pick 203db5d Use strbuf_read in builtin-fetch-tool.c. 
   pick a20d939 Use strbufs to in read_message (imap-send.c), custom buffer--. 
   pick 14f3d11 Replace all read_fd use with strbuf_read, and get rid of it.
   ~
   ~
   ~
   ~
   ~
   ~[1]

Then you can rewrite "pick" into "edit" if you want to change something in a commit, or "squash" if you want to merge it with the one from the line before.

What the small help doesn't say is that you can actually reorder your commits, and it will do what you expect it to do. I used it 10 minutes ago, because I have this string buffer module I extend on a regular basis, I squashed every API extension of that module in one commit using that.

Each time one change needs you to edit anything because either you asked for it, or that one of the change you asked for generated a conflict, then as usual the rebase will stop. You will be prompted to make the change, or fix the conflict, or merge comments (in case of a squash), and when all is in order, you just need to:

   $ git rebase --continue

This is just awsomely simple and intuitive

Notes

[1] if you don't have those, your $EDITOR sucks btw

Tuesday 27 March 2007

I now feel I've achieved something ...

The so called code is ugly (at least I would have beated my student to write such crap[1]). And to paraphrase Linus: I'm a disgusting pig, and proud of it !!!

   $ ./madmutt -f test.mbox
   MCore.pwd()    = /home/madcoder/dev/madmutt
   MCore.shell    = /bin/zsh
   -> setting MCore.shell to /madmutt/is/on/lua/crack gives:
   MCore.shell    = /madmutt/is/on/lua/crack
   MCore.version  = devel
   MTransport.sendmail   = /usr/sbin/sendmail -eom -oi
   -> exiting

edit: Some have wondered: I'm just extatic because I'm slowly replacing the good old muttrc with lua, and that my script that generates the lua bindings for me just works as expected. (Yes I'll obviously write some kind of legacy-thing importer at some point, but I'm really not anywhere near that point yet, even if I use madmutt daily, it's not even alpha quality: it basically works for me).

Notes

[1] yeah in another life I teached OCaml...

Sunday 25 February 2007

git rebase is not harmful, it's just _not_ always the best solution, that's all.

I must say I disagree with John a lot about git rebase.

git-rebase is the most nice feature I've ever seen float around in a SCM, and is part of the things I love in git. I mean, I do not disagree with the fact that it cannot be used in every single case. It is indeed meant to be used in private topic branches. git-rebase is meant to be used in a workflow where you have a topic branch (meaning some non releaseable nor push-able work) and that you still want to keep up with others work.

With other SCM's, you have to update your working repository, wich in many cases generates nasty conflicts, hard to deal with. Especially in svn e.g.. With git, you commit your work, pull the remote branch, and then rebase ontop of it. It makes a lot of sense, and when your topic branch is indeed ready, you can push it upstreams, and the next rebase will merge the bits that have been accepted automatically. I just can't think of a simpler way.

Btw, we use svn at work, and I do use git-svn instead of svn to be able to develop my own patches without fearing conflicts in the same way I did before. I know I will benefit from the powerfull git merging capabilities and help at any stage, even if I did not pulled the svn for a long time. That makes the developpement much more sane, as I only try to push coherent changes, hence on a less regular basis. You could not do that without git-rebase. In that sense, git-rebase is anything but evil.

Wednesday 21 February 2007

git tricks ...

when you have a svn-like use of git (I mean with a central repository), it's a good thing to repack the central repository from time to time. If your repository lives (totally random example) in /git/pkg-xorg/lib/mesa.git, you can do:

   cd /git/pkg-xorg/lib/mesa.git && GIT_DIR=. git repack -a -d

I'm told that some people that had more than 1.5Go of git repositories have seen their main repositories shrink well under 200Mo of disk usage. On my own git central server, I use that as a cron:

 #! /bin/bash
 GIT_BASE=/git
 for repo in $GIT_BASE/*.git; do
     pushd $repo &>/dev/null
     GIT_DIR=. git repack -a -d  &>/dev/null
     popd $repo &>/dev/null
 done

but on a big and loaded server one could make it better and repack only when it seems to be needed and use (untested):

 #! /bin/bash
 GIT_BASE=/git
 GIT_THRESHOLD=5000
 for repo in $GIT_BASE/*.git; do
     pushd $repo &>/dev/null
     if test $(find objects | cut -d/ -f3 | wc -w) -gt $GIT_THRESHOLD; then
         GIT_DIR=. git repack -a -d  &>/dev/null
     fi
     popd $repo &>/dev/null
 done
Monday 8 January 2007

POSIX DNS APIs considered harmful.

I'm quite upset with the POSIX APIs. I'm trying to write some kind of simple piece of software, to be included in a mail chain, doing a lot of DNS queries, mostly to RBLs.

I've met two kind of problems:

  1. the POSIX API is synchronous, and absolutely no usual API for non blocking requests exists, there is of course:
    • adns, but it's very bloated, does not work with ipv6,
    • c-ares, but it's quite bloated, does not work with ipv6, and need select (cannot be used with epoll e.g.),
    • libdnsresolv (hahahaha, that's an horrible patch over the BSD resolving API, it's horrible, yuck),
    • udns, that's the best of the four, but still, the code do not feels very well, solid (do not check recvfrom/sendto errors e.g.).
  2. there is no way to escape the system resolver, which is good for usual applications, but very bad when it come to RBLs.

The second point is in fact the worse. On mail queues I administrate, there is also very often a bind that is used as a caching DNS resolver. Very well, except that it complectely sucks with RBLs. RBL generate a lot of queries that resolve and that you will almost never ask again before its TTL expires, and the other kind of queries you do get NOT FOUND answers, that are usually not cached.

Too bad, the most useful answers are the NOT FOUND ones, and the cached answers are just here to make our local cache use huge amounts of memory for nothing. So RBLs just end up poisonning your system efficiency. That's quite ridiculous.

I'd really love to have a decent async resolver API, and a way to tell my DNS cache (here BIND) that the domains serves a RBL, and that I (for at the same time the sake of the RBL and from my system) want specific caching features for it. I fear that it wont be possible, and that I will end up writing some API specifically designed to craft DNS queries to the RBL servers (once again there is absolutely no reason to use forwarders for that, as it will end up by screwing your forwarder cache for almost no gain, and won't save you from a lot of useless queries with no answers at all).

The good point is that the kind of queries you need to do to RBL servers is indeed completely trivial:

  • only A and maybe one day AAAA queries ;
  • a query will never be greater than 512 octets, so only UDP is needed ;
  • you only have to ask the NS that serve your RBL (and here we use the local resolver API, to benefit from the caches), and query that RBL, no recursion is needed.

Then implementing on top of that some efficient caching (RBL-aware) will be rocking fast, and way more efficient than bind caching, without the poisonning effects. A big gain — I hope. What suprises me the most, is that I've not found anybody speaking about those side effects of RBLs on nameservers, almost no discussion at all. Very surprising for a protocol that is so vital for the internet as a general rule.

Wednesday 29 November 2006

pearls

Refactoring code is (sometimes) a real bit of fun. I've seen a lot of:

 if (!foo) {
     return NULL;
 } else {
     return foo;
 }

or quite a lot of :

 (<boolean expression>) ? true : false

but today I found one I've never seen before:

 foo ? -foo : 0