This page is a place to document tips and techniques for using git-annex.

what to do when you lose a repository
Posted Sat Sep 22 03:36:59 2012

using gitolite with git-annex
Posted Sat Sep 22 03:36:59 2012

finding duplicate files
Posted Tue Jul 17 17:54:57 2012

using git annex with no fixed hostname and optimising ssh
Posted Tue Jul 17 17:54:57 2012

using the web as a special remote
Posted Tue Jul 17 17:54:57 2012

visualizing repositories with gource
Posted Tue Jul 17 17:54:57 2012

centralised repository: starting from nothing
Posted Tue Jul 17 17:54:57 2012

migrating data to a new backend
Posted Tue Jul 17 17:54:57 2012

Internet Archive via S3
Posted Tue Jul 17 17:54:57 2012

centralized git repository tutorial
Posted Tue Jul 17 17:54:57 2012

powerful file matching
Posted Tue Jul 17 17:54:57 2012

automatically getting files on checkout
Posted Tue Jul 17 17:54:57 2012

what to do when a repository is corrupted
Posted Tue Jul 17 17:54:57 2012

using Amazon S3
Posted Tue Jul 17 17:54:57 2012

using assume-unstages to speed up git with large trees of annexed files
Posted Tue Jul 17 17:54:57 2012

recover data from lost+found
Posted Tue Jul 17 17:54:57 2012

untrusted repositories
Posted Tue Jul 17 17:54:57 2012

using the SHA1 backend
Posted Tue Jul 17 17:54:57 2012

using box.com as a special remote
Posted Tue Jul 17 17:54:57 2012

I've fixed the typo (anyone can edit pages in this wiki FWIW.)
Comment by http://joey.kitenet.net/ Sat Dec 24 16:54:31 2011
I'm confused by the fact that the git-annex-shell adc rejects any repo names that don't start with /~/ since none of my repos start that way. It seems work ok if I just delete /\~ from the front of the regex, but I feel like I must be missing something.
Comment by bremner Fri Dec 30 21:41:13 2011
dead is the best we can do. The automatic merging used on the git-annex branch tends to re-add lines that are deleted in one repo when merging with another that still has them.
Comment by http://joeyh.name/ Thu May 31 17:01:37 2012

Looks like you are missing a closing double quote on the line:

echo '$GL_ADC_PATH = "/usr/local/lib/gitolite/adc/;' >>~gitolite/.gitolite.rc

right after /;

I got this working by the way - great stuff.

Comment by http://www.openid.albertlash.com/openid/ Sat Dec 24 06:08:45 2011
Very nice :) Just for reference, here's my Perl implementation. As per this discussion it would be interesting to benchmark these two approaches and see if one is substantially more efficient than the other w.r.t. CPU and memory usage.
Comment by http://adamspiers.myopenid.com/ Fri Dec 23 19:16:50 2011

Is there a way to have git-annex completely ignore a repository? I see that the dead command adds the uuid of the repository to trust.log but does not change uuid.log. Is it enough to remove the corresponding line in uuid.log and trust.log?

Comment by http://dlaxalde.myopenid.com/ Thu May 31 14:36:33 2012

Well a repo url like gitolite@localhost:testing puts it in the gitolite user's /~/testing

This worked when I added the gitolite stuff, anyway.. Let's see if it still does:

joey@gnu:~/tmp>mkdir g
joey@gnu:~/tmp>cd g
joey@gnu:~/tmp/g>git init
Initialized empty Git repository in /home/joey/tmp/g/.git/
joey@gnu:~/tmp/g>git annex init
init  ok
joey@gnu:~/tmp/g>git remote add test 'gitolite@localhost:testing'
joey@gnu:~/tmp/g>touch foo
joey@gnu:~/tmp/g>git annex add foo
add foo (checksum...) ok
(Recording state in git...)
joey@gnu:~/tmp/g>git annex copy foo --to test --debug
git ["--git-dir=/home/joey/tmp/g/.git","--work-tree=/home/joey/tmp/g","ls-files","--cached","-z","--","foo"]
git ["--git-dir=/home/joey/tmp/g/.git","--work-tree=/home/joey/tmp/g","check-attr","annex.numcopies","-z","--stdin"]
git ["--git-dir=/home/joey/tmp/g/.git","--work-tree=/home/joey/tmp/g","show-ref","--hash","refs/heads/git-annex"]
git ["--git-dir=/home/joey/tmp/g/.git","--work-tree=/home/joey/tmp/g","show-ref","git-annex"]
git ["--git-dir=/home/joey/tmp/g/.git","--work-tree=/home/joey/tmp/g","cat-file","--batch"]
Running: ssh ["-4","gitolite@localhost","git-annex-shell 'configlist' '/~/testing'"]

Still seems right, the ADC's regexp will match this the git-annex shell command.

Comment by http://joey.kitenet.net/ Sat Dec 31 00:29:45 2011

I guess there is some path rewriting going in in gitolite proper because if try a url of the form ssh://git@localhost/testing, then it still works with gitolite, but fails with the ADC because the repo is passed as /testing:

Running: ssh ["git@host","git-annex-shell 'configlist' '/recommend'"]
Running: ssh ["git@host","git-annex-shell 'configlist' '/recommend'"]

What I have to ask Sitaram and or find in the docs is if this is a bug or a feature in gitolite. I can see how the leading slash would get swallowed up by this line

$repo = "'$REPO_BASE/$repo.git'"

in gl-auth-command, but I guess that isn't the whole story.

Comment by bremner Sat Dec 31 01:50:49 2011
The instructions state ANNEX_S3_ACCESS_KEY_ID and ANNEX_SECRET_ACCESS_KEY but git-annex cannot connect with those constants. git-annex tells me to set both "AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY" instead, which works. This is with Xubuntu 12.04.

I confirmed with Sitaram that this is intentional, if probably under-documented. Since the ADC strips the leading /~/ in assigning $start anyway, I guess something like the following will work


diff --git a/contrib/adc/git-annex-shell b/contrib/adc/git-annex-shell
index 7f9f5b8..523dfed 100755
--- a/contrib/adc/git-annex-shell
+++ b/contrib/adc/git-annex-shell
@@ -28,7 +28,7 @@ my $cmd=$ENV{SSH_ORIGINAL_COMMAND};
 # the second parameter.
 # Further parameters are not validated here (see below).
 die "bad git-annex-shell command: $cmd"
-    unless $cmd =~ m#^(git-annex-shell '\w+' ')/\~/([0-9a-zA-Z][0-9a-zA-Z._\@/+-
+    unless $cmd =~ m#^(git-annex-shell '\w+' ')/(?:\~\/)?([0-9a-zA-Z][0-9a-zA-Z.
 my $start = $1;
 my $repo = $2;
 my $end = $3;
Comment by bremner Sat Dec 31 03:34:17 2011
Comment by http://joey.kitenet.net/ Fri Dec 23 19:19:53 2011
Thanks, I've fixed that. (You could have too.. this is a wiki ;)
Comment by http://joeyh.name/ Tue May 29 19:10:42 2012

That patch seems ok, it doesn't seem to allow through any repo locations that were blocked before.

So, it has my blessing.. but the ADC is in gitolite and will need to be patched there.

Comment by http://joey.kitenet.net/ Sat Dec 31 18:32:28 2011

ControlPersist is awesome - thanks!

Here's an alternative, git-specific approach.

Comment by http://adamspiers.myopenid.com/ Fri Dec 23 13:31:33 2011
Ah right. git-annex normalizes all git ssh style user@host:dir to valid uris, which is where the /~/ comes from. I don't anticipate this changing on the git-annex side.
Comment by http://joey.kitenet.net/ Mon Jan 2 16:27:55 2012

After some debugging printing, here is my current understanding.

  • urls of the form git@host:~repo or ssh://git@host

    • git sends commands like "git-receive-pack '~/repo'
    • gitolite converts these to $REPO_BASE/~/repo which fails. ~/repo would also fail fwiw.
    • git-annex sends seems /~/repo, which works
  • urls of the form git@host:/repo or ssh://git@host/repo

    • git sends "git-receive-pack '/db/cs3383'"
    • gitolite converts this to $REPO_BASE/repo which works
    • git annex sends "git-annex-shell 'inannex' '/repo' ..." which works, but only with the patch above.
  • urls of the form git@host:repo

    • git sends "git-receive-pack 'repo'
    • gitolite converts this to $REPO_BASE/repo, which works
    • git-annex sends "git-annex-shell 'inannex' '/~/db/cs3383'...", which also works for git-annex-shell.

So the weird case is the last one where git and git-annex are sending different things over the wire. I don't know if you have other motivations for doing the url normalization on the client side, but it isn't needed for gitolite, and in some sense complicates things a little. On the other hand, now that I see what is going on, it isn't a big deal to just strip the leading /~ off in the adc. It does lead to the odd situation of some URLs working for git-annex but not git.

Comment by bremner Sat Dec 31 22:29:38 2011
Comments on this page are closed.